Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] UMA Universal Modular Accelerator Interface #60

Merged
merged 29 commits into from
Jun 1, 2022

Conversation

MichaelJKlaiber
Copy link
Contributor

opening PR for UMA pre-RFC

@MichaelJKlaiber MichaelJKlaiber changed the title Rfc uma [RFC] UMA Universal Modular Accelerator Interface Mar 8, 2022
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MichaelJKlaiber for this work!

I have done a round of review.

One thing that is missing is how UMA Partition operators (in the Reference Level explaintation). It would be great to describe what exactly being done using the registered patterns.

One thing I'd like anwsered here what sort of control it will allow on the passes run there : MergeComposite, AnnotateTarget, MergeCompilerRegions and ParititionGraph.

For e.g. some backend might not want the compiler regions merged, how would that be controlled ?

Also how would one register post-partitioning passes ?

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
UMA Partitioner:
* Register relay passes
* Register patterns - supported sub-graph operations
* Order: pre-partitioning passes, Graph partitioning, post-partitioning passes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for a new user/developer, it might beneficial explaining where to position a relay pass between "post-partitioning passes" and a "_register_relay_pass" -- which might seem not obvious who dont have deeper understanding of TVM . I think it is mainly because, post-partitioning passes run before OptimizeImpl(...) sequence of passes are run in the core compiler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the pass registrations (_register_relay_pass and _register_tir_pass) we are following the idea of phases, at which the passes are registered def _register_relay_pass(self, phase: int, relay_pass: tvm.transform.Pass) -> None. E.g. phase 0 would be pre-partitioning, phase 1 would be post-partitioning but before OptimizeImpl.

I agree, that the phases and their implications need proper documentation to help users decide where to place a pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Should we use well defined enums instead ?

Copy link
Contributor

@manupak manupak Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the text here (where we make a decision at the end). I think we should enumerate the following options, and highlight the reasoning of choice :

P1. Int based : _register_relay_pass(self, phase: int, relay_pass: tvm.transform.Pass)
P2. Enum based : _register_relay_pass(self, phase: tvm.transform.uma.Phase, relay_pass: tvm.transform.Pass)
P3. Seperate registerations :
_register_pre_partition_relay_pass(self, relay_pass: tvm.transform.Pass)
_register_post_partition_relay_pass(self, relay_pass: tvm.transform.Pass)
_register_post_optimization_relay_pass(self, relay_pass: tvm.transform.Pass)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use well defined enums instead ?

Yes, we were talking about this internally as well. Enums are probably the preferred solution. We will update the section with the options and add our reasoning of choice. Thanks for the great input!

mod, params = relay.frontend.from_pytorch(scripted_model, [("input_data", input_shape)])

# Register a UMA backend
UltraTrailBackend().register()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only thing the user need to do to register the backend ?
Will it do something to the effect :

TVM_REGISTER_TARGET_KIND("accelerator_B", kDLCPU)
    .set_attr<FTVMRelayToTIR>("RelayToTIR", relay::contrib::generic::RelayToTIR("accelerator_B"));
    .set_attr<FTVMTIRToRuntime>("TIRToRuntime", relay::contrib::generic::accelerator_B::TIRToRuntime);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only thing the user need to do to register the backend ? Will it do something to the effect :

Yes.

TVM_REGISTER_TARGET_KIND("accelerator_B", kDLCPU)
    .set_attr<FTVMRelayToTIR>("RelayToTIR", relay::contrib::generic::RelayToTIR("accelerator_B"));
    .set_attr<FTVMTIRToRuntime>("TIRToRuntime", relay::contrib::generic::accelerator_B::TIRToRuntime);

backend.register interacts with TVM core in 2 ways.

  1. It registers the target hooks as you said.
  2. It registers the target pattern tables using tvm.relay.op.contrib.register.register_pattern_table and ensures that both use the same compiler/target_name .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm.. TVM_REGISTER_TARGET_KIND is a C/C++ macro. Do we envision to have some sort of python decorator (as part of this work) to handle the registration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manupa-arm: This part was removed. The C++ code you mention is no longer needed for target registration.

CC: @cgerum

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a short addition regarding the implementation. We are using a global function RegisterTarget in C++, that takes the target name and registers the target together with the target hooks. RegisterTarget is called during the backend registration UltraTrailBackend().register(). To hide this process from the user we are not using a decorator, but I think it's a similar approach.

Copy link
Contributor

@manupak manupak Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense to me! It might be worth adding this to the Reference explanation :)

How/Do we deal with target specific options ?

https://github.com/apache/tvm/blob/fe7b5d329a82f720a721356c40abd721cf1d780d/src/target/target_kind.cc#L373

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to the Reference explanation (link).

Currently it is not possible to set target specific options. Since the UMA targets are essentially also "c" targets we did not see the need to deal with target specific options. Do you have a use-case in mind, for which this would be necessary?

Copy link
Contributor

@manupak manupak Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we currently use such options to define accelerator variants that share the same lowering pipeline.
In the absense of that, we would need to resort to use the PassConfig, however, generally PassConfig is generally better suited to set a configuration to a specific Pass. In the above case, it would mean we need to set multiple Passes in the absense.

I would see the newly registered targets as extensions of the "c" target and Im a bit keen on not ending up having to dump a union of UMA target options to "c" target.

Following your proposal, is there a reason why we wont be able to use RegisterTarget ? We could consider including AttrDict to that effect.

Adding to this, there are two variants of these :

  • relay.ext.<backend>.options

Which define the options for the lowering. This is inherited by the original BYOC design and we still use it with Target Hooks. This is partly due to the seperate existence kCompiler strings and actual targets.

  • target_kind options

This is what I alluded to in the previous comment.

Ideally, since UMA is wrapping Target Hooks, I suppose if we want to add this, we would want to proceed with the second option -- hence the suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, there are two variants of these :

  • relay.ext..options

Which define the options for the lowering. This is inherited by the original BYOC design and we still use it with Target Hooks. This is partly due to the seperate existence kCompiler strings and actual targets.

Yes this not exactly user friendly.

  • target_kind options

This is what I alluded to in the previous comment.

Ideally, since UMA is wrapping Target Hooks, I suppose if we want to add this, we would want to proceed with the second option -- hence the suggestion.

Adding them to the TargetKind would obviously be the preferred solution. As would potentially be a preferred way of doing things. It would allow for a clean target = Target("ethos-u -version=ethosu-256", host=Target("c")) but this makes the current partitioning Flow somewhat clumsy as we still need to run .partition for the target hooks. On the other hand it could serve as a starting point for deprecating partition_for and providing a unified API for Collage and non-collage flows. @mbs-octoml @areusch would that make sense?

@manupak manupak requested a review from Mousius March 10, 2022 11:22
@areusch
Copy link
Contributor

areusch commented Mar 11, 2022

@cgerum
Copy link
Contributor

cgerum commented Mar 14, 2022

One thing I'd like anwsered here what sort of control it will allow on the passes run there : MergeComposite, AnnotateTarget, MergeCompilerRegions and ParititionGraph.

So far we had planned to standardize on MergeComposite, AnnotateTarget, MergeCompilerRegions and ParititionGraph. To get a better overview I extracted the partitioning flows of existing BYOC targets:

BYOC Backend Pre Partition Passes Partition Post Partition Passes
arm_compute_lib InferType, MergeComposite, AnnotateTarget, PartitionGraph
bnns InferType, FoldConstant, FoldScaleAxis, DynamicToStatic, AlterOpLayout, FoldConstant, MergeComposite, AnnotateTarget, PartitionGraph
cmsisnn MergeComposite, AnnotateTarget, PartitionGraph, GenerateCMSISNNConstants, ScalarToTensorConstants, ExtractConstantsFromPartitionedFunction
cutlass SimplifyInference, FoldConstant, FoldScaleAxis, MergeComposite, AnnotateTarget, PartitionGraph
dnnl MergeComposite, AnnotateTarget, MergeCompilerRegions, PartitionGraph
ethosu MergeComposite, AnnotateTarget, MergeCompilerRegions, PartitionGraph, preprocess_ext_io
tensorrt RemoveDropoutPass, RemoveUnusedFunctions, ConvertLayout, FoldConstant, AnnotateTarget, MergeCompilerRegions, PartitionGraph
vitis_ai RemoveUnusedFunctions, ConvertLayout, FoldConstant, InferType, ("VitisAIAnnotationPass"), MergeCompilerRegions, PartitionGraph, RemoveUnusedFunctions, ConvertLayout, FoldConstant

Looking at the existing backends it might make sense to make MergeCompilerRegions optional. We probably do not want to support custom compiler annotations as used in vitis_ai target.

@manupak
Copy link
Contributor

manupak commented Mar 15, 2022

@cgerum thanks for detailed analysis!

Im wondering whether should we provide an optional partitioning hook as well -- so then it can be anything (i.e. any Sequential) and let the default be a Sequential of MergeComposite, AnnotateTarget, MergeCompilerRegions, ParititionGraph. WDYT ?

Copy link

@sunggg sunggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great proposal! I'm interested in how we can customize pass pipeline for diverse contexts and have a question regarding this. Look forward to learn more about UMA!

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
@cgerum
Copy link
Contributor

cgerum commented Mar 16, 2022

Im wondering whether should we provide an optional partitioning hook as well -- so then it can be anything (i.e. any Sequential) and let the default be a Sequential of MergeComposite, AnnotateTarget, MergeCompilerRegions, ParititionGraph. WDYT ?

Considering how partitioning is handled in #62 I would probably prefer a more declarative way of specifying different partitioning patterns. @MichaelJKlaiber @PaulPalomeroBernardo

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great proposal @MichaelJKlaiber! While reading the RFC I picked up on a couple of small things, feel free to ignore them :)

One overall question I have is whether this proposal is strictly limited to accelerators or whether it could also be used by any back-end that leverages the target hook functionality? For example, it seems possible to register kernel libraries (e.g. CMSIS-NN) using a similar interface?

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
@MichaelJKlaiber
Copy link
Contributor Author

One overall question I have is whether this proposal is strictly limited to accelerators or whether it could also be used by any back-end that leverages the target hook functionality? For example, it seems possible to register kernel libraries (e.g. CMSIS-NN) using a similar interface?

@lhutton1, I agree in general. The primary focus of UMA at the moment is accelerators. It might make sense to bear library integration in mind. In general, I see that it should be possible. The main difference might be the choice of configuration parameters needed.
@cgerum @PaulPalomeroBernardo what are your thoughts here?

@lhutton1, if you have suggestion or concrete examples, feel free to share them

  • Michael

@lhutton1
Copy link
Contributor

Thanks @MichaelJKlaiber, that makes sense. So I was wondering if this is the case, perhaps in the future this interface is used by other backend's (not accelerators) we would need to think about renaming UMA to something more generic e.g. UMB Universal Modular Backend - I'm not the best with names. I'm wondering if this could easily be done in the future?

@MichaelJKlaiber
Copy link
Contributor Author

Thanks @MichaelJKlaiber, that makes sense. So I was wondering if this is the case, perhaps in the future this interface is used by other backend's (not accelerators) we would need to think about renaming UMA to something more generic e.g. UMB Universal Modular Backend - I'm not the best with names. I'm wondering if this could easily be done in the future?

@lhutton1, I think for naming there are no limits for creativity, e.g. it could be UMA: Universal Modular bAckend 😄

PaulPalomeroBernardo and others added 5 commits April 1, 2022 11:30
* Add descriptions for all API functions
* Clarify backend registration and add target hook explanation
* Remove schedules from API and corresponding descriptions
Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md
@areusch
Copy link
Contributor

areusch commented May 11, 2022

for A1, what do people think of an ordering spec e.g.

self._register_relay_pass(ConfigGenerator(), before="FuseOps", after=("MergeComposite", "abcOtherPass"))

my concern is that whether int or enum, people are really expressing a dependency graph here and while we hope it is not terribly complicated, it's hard to intuit the meaning from an enum/int.

for A2, i agree with @manupa-arm 's question, but i think it would be awesome to see the prototype and we could discuss from there.

@PaulPalomeroBernardo
Copy link
Contributor

So here is my take on A2:

In the accelerator-specific backend, a user would register target attribute names e.g.

class UltraTrailBackend(UMABackend):
    def __init__(self):
        super(UltraTrailBackend, self).__init__()

        #######################################################################
        # Target configuration
        #######################################################################
        self._register_target_attr("ultra_trail_attr_1")
        self._register_target_attr("ultra_trail_attr_2")

They can be used during target creation similar to other sub_target strings

ut_target = tvm.target.Target("ultra_trail -ultra_trail_attr_1=attr1 -ultra_trail_attr_2=attr2")

This is basically implemented by passing a list of attribute names to the target kind registration

TVM_REGISTER_GLOBAL("relay.backend.contrib.uma.RegisterTarget")
    .set_body_typed([](String target_name, Array<String> attr_names){
        auto target_kind = ::tvm::TargetKindRegEntry::RegisterOrGet(target_name)
        .set_name()
        .set_device_type(kDLCPU)
        .add_attr_option<Array<String>>("keys")
        .add_attr_option<String>("tag")
        .add_attr_option<String>("device")
        .add_attr_option<String>("model")
        .add_attr_option<Array<String>>("libs")
        .add_attr_option<Target>("host")
        .add_attr_option<Integer>("from_device")
        .set_attr<FTVMRelayToTIR>("RelayToTIR", relay::contrib::uma::RelayToTIR(target_name))
        .set_attr<FTVMTIRToRuntime>("TIRToRuntime", relay::contrib::uma::TIRToRuntime);

        for (auto &attr_name : attr_names) {
            target_kind.add_attr_option<String>(attr_name);
        }
    });

The main downside I see with this, is that all attributes are treated as strings since the type is hardcoded. However, I'm not sure if we can avoid this at all.

What do you think?

For A1:
I would like to keep the phases. They definitely need proper documentation, but I think a handfull of phases (e.g., PRE_PARTITIONING, POST_PARTITIONING, ...) provide more orientation for new users than having to explicitly define the dependencies to other passes. We could think of also supporting the before and after options to provide more flexibility for experienced users.

@cgerum
Copy link
Contributor

cgerum commented May 16, 2022

The main downside I see with this, is that all attributes are treated as strings since the type is hardcoded. However, I'm not sure if we can avoid this at all.

This could probably be solved by adding type and/or default arguments to the argument parser, e.g.:

  self._register_target_attr("ultra_trail_attr_1", default=False)

For v1 I would prefer to only support default values, and restrict supported dtypes to string, int and bool.

For A1: I would like to keep the phases. They definitely need proper documentation, but I think a handfull of phases (e.g., PRE_PARTITIONING, POST_PARTITIONING, ...) provide more orientation for new users than having to explicitly define the dependencies to other passes. We could think of also supporting the before and after options to provide more flexibility for experienced users.

I agree with @PaulPalomeroBernardo on the user perspective.
Implementing before and after options, also goes beyond the current Scope of UMA in my opinion. If we were to expose it in the UMA-API, pass dependencies should probably be implemented in TVM core.

@MichaelJKlaiber
Copy link
Contributor Author

Thanks @cgerum and @PaulPalomeroBernardo . I agree, this totally makes sense like this.

@manupa-arm @areusch is this sufficiently detailed for you? I propose to discuss outstanding topics in meeting to settle for UMA-v1. We could use the Community Meeting on May 25th. Or if these discussions are too specific for a broader audience, then we can setup a separate meeting.

What are your thoughts?

@manupak
Copy link
Contributor

manupak commented May 17, 2022

They can be used during target creation similar to other sub_target strings

ut_target = tvm.target.Target("ultra_trail -ultra_trail_attr_1=attr1 -ultra_trail_attr_2=attr2")
This could probably be solved by adding type and/or default arguments to the argument parser, e.g.:

  self._register_target_attr("ultra_trail_attr_1", default=False)

This aligns with A2.2 -- directly registering each attribute. I think this is fine for UMA-v1 and aligns with state of TVM targets today. Should we just put a note that for future considerations, to include a registration for string preprocessor (A2.1) to extract attributes ?

For A1:
I would like to keep the phases. They definitely need proper documentation, but I think a handfull of phases (e.g., PRE_PARTITIONING, POST_PARTITIONING, ...) provide more orientation for new users than having to explicitly define the dependencies to other passes. We could think of also supporting the before and after options to provide more flexibility for experienced users.

Again, I think phase approach is fine for v1 as we already have that in the core compiler (which is also int based) but I'd appreciate if we can put a "name" to ease the reasoning in future. Similarly, we could also note as future work to define dependencies on passes -- if and when the TVM core compiler improve its pass infrastructure we could be able to use that information.

@PaulPalomeroBernardo
Copy link
Contributor

This aligns with A2.2 -- directly registering each attribute.

@manupa-arm Then just for clarification a few questions because I might have misunderstood your initial idea. For A2.1 you were thinking about registering an attribute preprocessor to the target Target().add_attrs_preprocessor(Preprocessor) that would operate on a predefined attribute (e.g., -uma_attrs=<string>) by processing the <string> and creating a Dict/Map from it?

So a user would only write tvm.target.Target("ultra_trail -uma_attrs=<my custom attr string>") and in code you would access the target via target.attrs["uma_attrs"]["attr1"], target.attrs["uma_attrs"]["attr2"], ect.?

@manupak
Copy link
Contributor

manupak commented May 17, 2022

So a user would only write tvm.target.Target("ultra_trail -uma_attrs=") and in code you would access the target via target.attrs["uma_attrs"]["attr1"], target.attrs["uma_attrs"]["attr2"], ect.?

More or less yes -- maybe we could (re)use "mattr" instead of "uma_attrs" looking at other target kinds -- but in principle that is what I meant.

@areusch
Copy link
Contributor

areusch commented May 18, 2022

ok for A1 i'm good with named phases and we can modify as necessary. i think the A2.2 solution of directly registering target attrs makes sense to me. is that the direction we're aligned on here?

we can discuss this next week at the community meeting, or if we're in alignment on these two items, i think all that remains is to update the RFC to reflect the discussion here and we can approve/merge.

@mbs-octoml
Copy link
Contributor

Apologies for not following the conversation in detail in real time. Here are some thoughts on how we can make sure an UMA-integrated accelerator is also a Collage-supported 'backend'.

  • The registration of patterns will need to support the existing triple of (pattern name, pattern, predicate) since the predicates are necessary to control support based on dtypes, shapes, backend version, etc. No big deal.
  • I'm assuming those triples will continue to end up in either the global pattern table registry, or can be otherwise retrieved by a system like Collage which wishes to bypass the 'eager' UMA partitioning with it's own search. But again no big deal, just need to know where to look.
  • Though not significant to Collage, I assume the order of application of the partitioning patterns matches the registration order?
  • Collage requires external codegen compiler names to be 1:1 with already registered target kinds with the same kind name. It also requires instances of those targets to be provided in the build targets list, even if those instances are nothing other than Target("my_backend") with no extra attributes. But the target kinds may also support additional attributes, and the various transitions into external codegen code have been changed to ensure the matching Target instance has been pushed as the Target.current() so that codegen can retrieve and extract any attributes to guide compilation. I think that matches some of the conversation above, except that the attributes can be fetched by Target.current().get_attr("foo"), but I might have missed the point in that sub-thread.
  • Collage assumes a regular build of an IRModule will respect any existing "Compiler" attributed functions already in the module. I think all that means is that the UMA partitioner should respect existing partitions, but otherwise trigger the appropriate custom downstream compilation, and given the partitioner uses the existing passes I think that should all Just Work.
  • Collage assumes it can do it's partitioning before any other backend-specific passes. I'm assuming however that some of the Relay pass phases mentioned can be before partitioning. If so I'm guessing we'd need to first apply those pre-partitioning phases in deterministic order in the hope that they sensibly compose, then partition using Collage, then run the post-partitioning phases as usual.
  • Collage uses the list of available Targets to guide it's search, but if I understand correctly UMA uses the registration of backends to enforce a fixed partitioning order. Perhaps this suggests the Collage partitioner should be integrated as a user-controlled alternative to the default 'eager' partitoner supplied by UMA (presumably as a loop of the usual Relay MergeComposite/AnnotateTarget/MergeCompilerRegions?/PartitionGraph passes for each backend). That way the user can use the same construct-and-register-backends-of-interest API.
  • I'm surprised by the emphasis on going via TIR. Are we explicitly saying any BYOC integrations which don't need/want to go via TIR don't fall under the UMA integration API? If so that will make Collage/UMA integration harder since Collage would have to account for both UMA-style and original-style integrations.

Thanks,
-m

* Target registration with support for attribute options
* Pass phases as enums
@mbs-octoml
Copy link
Contributor

One more collage/uma overlap aspect: Collage distinguishes 'registered' backends (ie just TargetKinds) from 'activated' backends (ie Target objects in the provided build targets). I think though the proposal here is the act of registration is also activation? I need help understanding how this will look from the user's pov in combination with targets.

Update target registration and add pass phases
@PaulPalomeroBernardo
Copy link
Contributor

Thanks @mbs-octoml for this detailed explanation. Being a Collage-supported backend is definitely something we want to achieve for UMA-integrated backends.

The registration of patterns will need to support the existing triple of (pattern name, pattern, predicate) since the predicates are necessary to control support based on dtypes, shapes, backend version, etc. No big deal.

We will add this to the pattern registration.

I'm assuming those triples will continue to end up in either the global pattern table registry, or can be otherwise retrieved by a system like Collage which wishes to bypass the 'eager' UMA partitioning with it's own search. But again no big deal, just need to know where to look.

They are registered in the global pattern table registry during backend registration but can also be accessed directly over the backend object if necessary.

Though not significant to Collage, I assume the order of application of the partitioning patterns matches the registration order?

Correct.

Collage requires external codegen compiler names to be 1:1 with already registered target kinds with the same kind name. It also requires instances of those targets to be provided in the build targets list, even if those instances are nothing other than Target("my_backend") with no extra attributes. But the target kinds may also support additional attributes, and the various transitions into external codegen code have been changed to ensure the matching Target instance has been pushed as the Target.current() so that codegen can retrieve and extract any attributes to guide compilation. I think that matches some of the conversation above, except that the attributes can be fetched by Target.current().get_attr("foo"), but I might have missed the point in that sub-thread.

I think, this works well. After the backend registration (e.g., UMABackend.register()) the target kind, which matches the required codegen compiler name, is available. From there, a target can be created (with or without attributes) and passed to the build target list.

Collage assumes a regular build of an IRModule will respect any existing "Compiler" attributed functions already in the module. I think all that means is that the UMA partitioner should respect existing partitions, but otherwise trigger the appropriate custom downstream compilation, and given the partitioner uses the existing passes I think that should all Just Work.

I agree.

Collage assumes it can do it's partitioning before any other backend-specific passes. I'm assuming however that some of the Relay pass phases mentioned can be before partitioning. If so I'm guessing we'd need to first apply those pre-partitioning phases in deterministic order in the hope that they sensibly compose, then partition using Collage, then run the post-partitioning phases as usual.

Yes, we were planning to include a pre-partitioning pass phase. Passes within one pass phase should always be executed in order of their registration.

Collage uses the list of available Targets to guide it's search, but if I understand correctly UMA uses the registration of backends to enforce a fixed partitioning order. Perhaps this suggests the Collage partitioner should be integrated as a user-controlled alternative to the default 'eager' partitoner supplied by UMA (presumably as a loop of the usual Relay MergeComposite/AnnotateTarget/MergeCompilerRegions?/PartitionGraph passes for each backend). That way the user can use the same construct-and-register-backends-of-interest API.

Currently a user needs to explicitly call partition() on the registered backend to perform the usual MergeComposite/AnnotateTarget/MergeCompilerRegions?/PartitionGraph passes plus the relevant relay pass phases (e.g., pre-partitioning).

backendA= MyUMABackendA()
backendB= MyUMABackendB()

backendA.register()
backendB.register()
mod = backendA.partition(mod)
mod = backendB.partition(mod)

As you described this would eagerly partition the graph depending on the call order of .partition(). This would actually give the user the opportunity to skip this partitioning and directly go for the Collage approach. I am not sure if this is the best solution though.

I'm surprised by the emphasis on going via TIR. Are we explicitly saying any BYOC integrations which don't need/want to go via TIR don't fall under the UMA integration API? If so that will make Collage/UMA integration harder since Collage would have to account for both UMA-style and original-style integrations.

As it is now, they would not fall under the UMA integration API. With UMA we wanted to wrap one specific BYOC integration into an easy-to-use interface and we decided to go with the target hooks via TIR (relay_to_tir, tir_to_runtime). However, if there is enough motivation we could think about adding relay_to_runtime as a second path. This would require greater changes to the current architecture so I don't see it as part of UMA v1 but we can take this into account for future development.

One more collage/uma overlap aspect: Collage distinguishes 'registered' backends (ie just TargetKinds) from 'activated' backends (ie Target objects in the provided build targets). I think though the proposal here is the act of registration is also activation? I need help understanding how this will look from the user's pov in combination with targets.

There are three steps required to make use of UMA as a user.

  1. Create and instantiate a UMA backend backend = MyUMABackend()
  2. Register the backend backend.register()
  3. Apply the standard partitioning (might not be necessary with Collage)

backend.register() is registering the target kind, a pattern table, and global functions required by the UMA lowering. I think this is more or less equivalent with the Collage 'registration'. Only when the partitioning annotates a subgraph for the backend, it is 'activated'.

@PaulPalomeroBernardo
Copy link
Contributor

ok for A1 i'm good with named phases and we can modify as necessary. i think the A2.2 solution of directly registering target attrs makes sense to me. is that the direction we're aligned on here?
we can discuss this next week at the community meeting, or if we're in alignment on these two items, i think all that remains is to update the RFC to reflect the discussion here and we can approve/merge.

@manupa-arm @areusch I think, we are aligned on this. We decided to go with the enum-based approach for A1 and use A2.2 for UMA v1. I updated the RFC accordingly (Pass Phases, Target Hooks).

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (bar @mbs-octoml comments related to Collage).
Two nits to adjust the text but the design looks good to me.

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I ll let @areusch and @mbs-octoml to cover Collage related concerns here ?

@mbs-octoml
Copy link
Contributor

backendA= MyUMABackendA()
backendB= MyUMABackendB()

backendA.register()
backendB.register()
mod = backendA.partition(mod)
mod = backendB.partition(mod)

Ah, that's the example I was missing (sorry!). After registration I think calling backend.partition or letting CollagePartition 'do it for you' seems like a free choice, and all we have to do is make sure Collage respects all the existing pass hooks (which, since I'm moving CUTLASS over the TargetHooks it has been forced to do anyway!).

As it is now, they would not fall under the UMA integration API.
Only when the partitioning annotates a subgraph for the backend, it is 'activated'.

Given above I don't think either of these points is an issue: Collage will pickup both 'low level' and 'UMA-style' integrations without prejudice. There may be a temptation from users to add compiler-configuration into the backend ctor, but it sounds like we agree we'll keep that in the Target object instances, in which case everything blends nicely.

So all LGTM from me, thanks for the extra explanation, and if any Collage-introduced friction shows up please just let me know and we can adjust mid-flight.

Best, -m

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulPalomeroBernardo @MichaelJKlaiber great! I think we are basically aligned here. I've asked for one more clarification given the discussion around use cases, then I think we can merge.

would you guys still like to discuss this on Wednesday? We have a few different topics, so I'm wondering if we may just need a brief 10 minutes to do an overview of the changes here? i think we can merge this RFC as soon as my one comment is addressed

def target_name(self):
return "ultra_trail"
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you guys add a brief example of how to use this once you implement this backend class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an override of the abstract property defined in the base class UMABackend

    @property
    @abstractmethod
    def target_name(self) -> str:
        """Name of the hardware target.

        Returns
        -------
        out : str
            The hardware target name.
        """
        ...

It's primarily used internally (e.g., target kind, target related global function names).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @areusch, 10 mins to show the changes are fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PaulPalomeroBernardo sorry i meant--can you add an example of how you might call tvm.relay.build() here, just so folks can understand it from user guide perspective?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch Ahh right, so basically move this section up to the guide-level explanation? I think that makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly! i think i'm not seeing that change otherwise i'd hit merge. can you ping again when that's done? and then i think we're good here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch, done :)

@areusch areusch merged commit 6990e13 into apache:main Jun 1, 2022
@areusch
Copy link
Contributor

areusch commented Jun 1, 2022

thanks @MichaelJKlaiber @PaulPalomeroBernardo @cgerum and others! the RFC is now merged. Please open a tracking issue and link it from this thread for discoverability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need review RFC needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants