[RFC] UMA Universal Modular Accelerator Interface #60

MichaelJKlaiber · 2022-03-08T12:43:29Z

opening PR for UMA pre-RFC

Update code snippets

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

Rfc uma

manupak

I have done a round of review.

One thing that is missing is how UMA Partition operators (in the Reference Level explaintation). It would be great to describe what exactly being done using the registered patterns.

One thing I'd like anwsered here what sort of control it will allow on the passes run there : MergeComposite, AnnotateTarget, MergeCompilerRegions and ParititionGraph.

For e.g. some backend might not want the compiler regions merged, how would that be controlled ?

Also how would one register post-partitioning passes ?

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md

manupak · 2022-03-10T11:11:31Z

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md

+UMA Partitioner: 
+* Register relay passes
+* Register patterns - supported sub-graph operations
+* Order: pre-partitioning passes, Graph partitioning, post-partitioning passes


Maybe for a new user/developer, it might beneficial explaining where to position a relay pass between "post-partitioning passes" and a "_register_relay_pass" -- which might seem not obvious who dont have deeper understanding of TVM . I think it is mainly because, post-partitioning passes run before OptimizeImpl(...) sequence of passes are run in the core compiler.

For the pass registrations (_register_relay_pass and _register_tir_pass) we are following the idea of phases, at which the passes are registered def _register_relay_pass(self, phase: int, relay_pass: tvm.transform.Pass) -> None. E.g. phase 0 would be pre-partitioning, phase 1 would be post-partitioning but before OptimizeImpl.

I agree, that the phases and their implications need proper documentation to help users decide where to place a pass.

Thanks!

Should we use well defined enums instead ?

I think for the text here (where we make a decision at the end). I think we should enumerate the following options, and highlight the reasoning of choice :

P1. Int based : _register_relay_pass(self, phase: int, relay_pass: tvm.transform.Pass)
P2. Enum based : _register_relay_pass(self, phase: tvm.transform.uma.Phase, relay_pass: tvm.transform.Pass)
P3. Seperate registerations :
_register_pre_partition_relay_pass(self, relay_pass: tvm.transform.Pass)
_register_post_partition_relay_pass(self, relay_pass: tvm.transform.Pass)
_register_post_optimization_relay_pass(self, relay_pass: tvm.transform.Pass)

Should we use well defined enums instead ?

Yes, we were talking about this internally as well. Enums are probably the preferred solution. We will update the section with the options and add our reasoning of choice. Thanks for the great input!

manupak · 2022-03-10T11:14:45Z

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md

+mod, params = relay.frontend.from_pytorch(scripted_model, [("input_data", input_shape)])
+
+# Register a UMA backend
+UltraTrailBackend().register()


Is this the only thing the user need to do to register the backend ?
Will it do something to the effect :

TVM_REGISTER_TARGET_KIND("accelerator_B", kDLCPU) .set_attr<FTVMRelayToTIR>("RelayToTIR", relay::contrib::generic::RelayToTIR("accelerator_B")); .set_attr<FTVMTIRToRuntime>("TIRToRuntime", relay::contrib::generic::accelerator_B::TIRToRuntime);

Is this the only thing the user need to do to register the backend ? Will it do something to the effect :

Yes.

TVM_REGISTER_TARGET_KIND("accelerator_B", kDLCPU) .set_attr<FTVMRelayToTIR>("RelayToTIR", relay::contrib::generic::RelayToTIR("accelerator_B")); .set_attr<FTVMTIRToRuntime>("TIRToRuntime", relay::contrib::generic::accelerator_B::TIRToRuntime);

backend.register interacts with TVM core in 2 ways.

It registers the target hooks as you said.

It registers the target pattern tables using tvm.relay.op.contrib.register.register_pattern_table and ensures that both use the same compiler/target_name .

Hmmmmm.. TVM_REGISTER_TARGET_KIND is a C/C++ macro. Do we envision to have some sort of python decorator (as part of this work) to handle the registration?

@manupa-arm: This part was removed. The C++ code you mention is no longer needed for target registration.

CC: @cgerum

Just a short addition regarding the implementation. We are using a global function RegisterTarget in C++, that takes the target name and registers the target together with the target hooks. RegisterTarget is called during the backend registration UltraTrailBackend().register(). To hide this process from the user we are not using a decorator, but I think it's a similar approach.

That make sense to me! It might be worth adding this to the Reference explanation :)

How/Do we deal with target specific options ?

https://github.com/apache/tvm/blob/fe7b5d329a82f720a721356c40abd721cf1d780d/src/target/target_kind.cc#L373

I added it to the Reference explanation (link).

Currently it is not possible to set target specific options. Since the UMA targets are essentially also "c" targets we did not see the need to deal with target specific options. Do you have a use-case in mind, for which this would be necessary?

Well, we currently use such options to define accelerator variants that share the same lowering pipeline.
In the absense of that, we would need to resort to use the PassConfig, however, generally PassConfig is generally better suited to set a configuration to a specific Pass. In the above case, it would mean we need to set multiple Passes in the absense.

I would see the newly registered targets as extensions of the "c" target and Im a bit keen on not ending up having to dump a union of UMA target options to "c" target.

Following your proposal, is there a reason why we wont be able to use RegisterTarget ? We could consider including AttrDict to that effect.

Adding to this, there are two variants of these :

relay.ext.<backend>.options

Which define the options for the lowering. This is inherited by the original BYOC design and we still use it with Target Hooks. This is partly due to the seperate existence kCompiler strings and actual targets.

target_kind options

This is what I alluded to in the previous comment.

Ideally, since UMA is wrapping Target Hooks, I suppose if we want to add this, we would want to proceed with the second option -- hence the suggestion.

Adding to this, there are two variants of these :

relay.ext..options

Which define the options for the lowering. This is inherited by the original BYOC design and we still use it with Target Hooks. This is partly due to the seperate existence kCompiler strings and actual targets.

Yes this not exactly user friendly.

target_kind options

This is what I alluded to in the previous comment.

Ideally, since UMA is wrapping Target Hooks, I suppose if we want to add this, we would want to proceed with the second option -- hence the suggestion.

Adding them to the TargetKind would obviously be the preferred solution. As would potentially be a preferred way of doing things. It would allow for a clean target = Target("ethos-u -version=ethosu-256", host=Target("c")) but this makes the current partitioning Flow somewhat clumsy as we still need to run .partition for the target hooks. On the other hand it could serve as a starting point for deprecating partition_for and providing a unified API for Collage and non-collage flows. @mbs-octoml @areusch would that make sense?

areusch · 2022-03-11T16:55:53Z

cc @comaniac @csullivan @jroesch @tmoreau89 @masahi

cgerum · 2022-03-14T14:52:05Z

One thing I'd like anwsered here what sort of control it will allow on the passes run there : MergeComposite, AnnotateTarget, MergeCompilerRegions and ParititionGraph.

So far we had planned to standardize on MergeComposite, AnnotateTarget, MergeCompilerRegions and ParititionGraph. To get a better overview I extracted the partitioning flows of existing BYOC targets:

BYOC Backend	Pre Partition Passes	Partition	Post Partition Passes
arm_compute_lib	InferType,	MergeComposite, AnnotateTarget, PartitionGraph
bnns	InferType, FoldConstant, FoldScaleAxis, DynamicToStatic, AlterOpLayout, FoldConstant,	MergeComposite, AnnotateTarget, PartitionGraph
cmsisnn		MergeComposite, AnnotateTarget, PartitionGraph,	GenerateCMSISNNConstants, ScalarToTensorConstants, ExtractConstantsFromPartitionedFunction
cutlass	SimplifyInference, FoldConstant, FoldScaleAxis,	MergeComposite, AnnotateTarget, PartitionGraph
dnnl		MergeComposite, AnnotateTarget, MergeCompilerRegions, PartitionGraph
ethosu		MergeComposite, AnnotateTarget, MergeCompilerRegions, PartitionGraph,	preprocess_ext_io
tensorrt	RemoveDropoutPass, RemoveUnusedFunctions, ConvertLayout, FoldConstant,	AnnotateTarget, MergeCompilerRegions, PartitionGraph
vitis_ai	RemoveUnusedFunctions, ConvertLayout, FoldConstant, InferType,	("VitisAIAnnotationPass"), MergeCompilerRegions, PartitionGraph,	RemoveUnusedFunctions, ConvertLayout, FoldConstant

Looking at the existing backends it might make sense to make MergeCompilerRegions optional. We probably do not want to support custom compiler annotations as used in vitis_ai target.

manupak · 2022-03-15T11:05:27Z

@cgerum thanks for detailed analysis!

Im wondering whether should we provide an optional partitioning hook as well -- so then it can be anything (i.e. any Sequential) and let the default be a Sequential of MergeComposite, AnnotateTarget, MergeCompilerRegions, ParititionGraph. WDYT ?

sunggg

Thanks for the great proposal! I'm interested in how we can customize pass pipeline for diverse contexts and have a question regarding this. Look forward to learn more about UMA!

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

Rfc uma

cgerum · 2022-03-16T14:42:09Z

Im wondering whether should we provide an optional partitioning hook as well -- so then it can be anything (i.e. any Sequential) and let the default be a Sequential of MergeComposite, AnnotateTarget, MergeCompilerRegions, ParititionGraph. WDYT ?

Considering how partitioning is handled in #62 I would probably prefer a more declarative way of specifying different partitioning patterns. @MichaelJKlaiber @PaulPalomeroBernardo

lhutton1

Thanks for the great proposal @MichaelJKlaiber! While reading the RFC I picked up on a couple of small things, feel free to ignore them :)

One overall question I have is whether this proposal is strictly limited to accelerators or whether it could also be used by any back-end that leverages the target hook functionality? For example, it seems possible to register kernel libraries (e.g. CMSIS-NN) using a similar interface?

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md

MichaelJKlaiber · 2022-03-18T18:22:58Z

One overall question I have is whether this proposal is strictly limited to accelerators or whether it could also be used by any back-end that leverages the target hook functionality? For example, it seems possible to register kernel libraries (e.g. CMSIS-NN) using a similar interface?

@lhutton1, I agree in general. The primary focus of UMA at the moment is accelerators. It might make sense to bear library integration in mind. In general, I see that it should be possible. The main difference might be the choice of configuration parameters needed.
@cgerum @PaulPalomeroBernardo what are your thoughts here?

@lhutton1, if you have suggestion or concrete examples, feel free to share them

Michael

uma-rfc: update to questions/comments added

lhutton1 · 2022-03-22T17:28:07Z

Thanks @MichaelJKlaiber, that makes sense. So I was wondering if this is the case, perhaps in the future this interface is used by other backend's (not accelerators) we would need to think about renaming UMA to something more generic e.g. UMB Universal Modular Backend - I'm not the best with names. I'm wondering if this could easily be done in the future?

MichaelJKlaiber · 2022-03-23T09:02:33Z

Thanks @MichaelJKlaiber, that makes sense. So I was wondering if this is the case, perhaps in the future this interface is used by other backend's (not accelerators) we would need to think about renaming UMA to something more generic e.g. UMB Universal Modular Backend - I'm not the best with names. I'm wondering if this could easily be done in the future?

@lhutton1, I think for naming there are no limits for creativity, e.g. it could be UMA: Universal Modular bAckend 😄

* Add descriptions for all API functions * Clarify backend registration and add target hook explanation * Remove schedules from API and corresponding descriptions

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

Rfc uma

areusch · 2022-05-11T18:29:44Z

for A1, what do people think of an ordering spec e.g.

self._register_relay_pass(ConfigGenerator(), before="FuseOps", after=("MergeComposite", "abcOtherPass"))

my concern is that whether int or enum, people are really expressing a dependency graph here and while we hope it is not terribly complicated, it's hard to intuit the meaning from an enum/int.

for A2, i agree with @manupa-arm 's question, but i think it would be awesome to see the prototype and we could discuss from there.

PaulPalomeroBernardo · 2022-05-13T17:33:47Z

So here is my take on A2:

In the accelerator-specific backend, a user would register target attribute names e.g.

class UltraTrailBackend(UMABackend):
    def __init__(self):
        super(UltraTrailBackend, self).__init__()

        #######################################################################
        # Target configuration
        #######################################################################
        self._register_target_attr("ultra_trail_attr_1")
        self._register_target_attr("ultra_trail_attr_2")

They can be used during target creation similar to other sub_target strings

ut_target = tvm.target.Target("ultra_trail -ultra_trail_attr_1=attr1 -ultra_trail_attr_2=attr2")

This is basically implemented by passing a list of attribute names to the target kind registration

TVM_REGISTER_GLOBAL("relay.backend.contrib.uma.RegisterTarget")
    .set_body_typed([](String target_name, Array<String> attr_names){
        auto target_kind = ::tvm::TargetKindRegEntry::RegisterOrGet(target_name)
        .set_name()
        .set_device_type(kDLCPU)
        .add_attr_option<Array<String>>("keys")
        .add_attr_option<String>("tag")
        .add_attr_option<String>("device")
        .add_attr_option<String>("model")
        .add_attr_option<Array<String>>("libs")
        .add_attr_option<Target>("host")
        .add_attr_option<Integer>("from_device")
        .set_attr<FTVMRelayToTIR>("RelayToTIR", relay::contrib::uma::RelayToTIR(target_name))
        .set_attr<FTVMTIRToRuntime>("TIRToRuntime", relay::contrib::uma::TIRToRuntime);

        for (auto &attr_name : attr_names) {
            target_kind.add_attr_option<String>(attr_name);
        }
    });

The main downside I see with this, is that all attributes are treated as strings since the type is hardcoded. However, I'm not sure if we can avoid this at all.

What do you think?

For A1:
I would like to keep the phases. They definitely need proper documentation, but I think a handfull of phases (e.g., PRE_PARTITIONING, POST_PARTITIONING, ...) provide more orientation for new users than having to explicitly define the dependencies to other passes. We could think of also supporting the before and after options to provide more flexibility for experienced users.

cgerum · 2022-05-16T06:40:47Z

The main downside I see with this, is that all attributes are treated as strings since the type is hardcoded. However, I'm not sure if we can avoid this at all.

This could probably be solved by adding type and/or default arguments to the argument parser, e.g.:

  self._register_target_attr("ultra_trail_attr_1", default=False)

For v1 I would prefer to only support default values, and restrict supported dtypes to string, int and bool.

For A1: I would like to keep the phases. They definitely need proper documentation, but I think a handfull of phases (e.g., PRE_PARTITIONING, POST_PARTITIONING, ...) provide more orientation for new users than having to explicitly define the dependencies to other passes. We could think of also supporting the before and after options to provide more flexibility for experienced users.

I agree with @PaulPalomeroBernardo on the user perspective.
Implementing before and after options, also goes beyond the current Scope of UMA in my opinion. If we were to expose it in the UMA-API, pass dependencies should probably be implemented in TVM core.

MichaelJKlaiber · 2022-05-17T08:05:49Z

Thanks @cgerum and @PaulPalomeroBernardo . I agree, this totally makes sense like this.

@manupa-arm @areusch is this sufficiently detailed for you? I propose to discuss outstanding topics in meeting to settle for UMA-v1. We could use the Community Meeting on May 25th. Or if these discussions are too specific for a broader audience, then we can setup a separate meeting.

What are your thoughts?

manupak · 2022-05-17T08:33:05Z

They can be used during target creation similar to other sub_target strings

ut_target = tvm.target.Target("ultra_trail -ultra_trail_attr_1=attr1 -ultra_trail_attr_2=attr2")

This could probably be solved by adding type and/or default arguments to the argument parser, e.g.:

  self._register_target_attr("ultra_trail_attr_1", default=False)

This aligns with A2.2 -- directly registering each attribute. I think this is fine for UMA-v1 and aligns with state of TVM targets today. Should we just put a note that for future considerations, to include a registration for string preprocessor (A2.1) to extract attributes ?

For A1:
I would like to keep the phases. They definitely need proper documentation, but I think a handfull of phases (e.g., PRE_PARTITIONING, POST_PARTITIONING, ...) provide more orientation for new users than having to explicitly define the dependencies to other passes. We could think of also supporting the before and after options to provide more flexibility for experienced users.

Again, I think phase approach is fine for v1 as we already have that in the core compiler (which is also int based) but I'd appreciate if we can put a "name" to ease the reasoning in future. Similarly, we could also note as future work to define dependencies on passes -- if and when the TVM core compiler improve its pass infrastructure we could be able to use that information.

PaulPalomeroBernardo · 2022-05-17T10:03:43Z

This aligns with A2.2 -- directly registering each attribute.

@manupa-arm Then just for clarification a few questions because I might have misunderstood your initial idea. For A2.1 you were thinking about registering an attribute preprocessor to the target Target().add_attrs_preprocessor(Preprocessor) that would operate on a predefined attribute (e.g., -uma_attrs=<string>) by processing the <string> and creating a Dict/Map from it?

So a user would only write tvm.target.Target("ultra_trail -uma_attrs=<my custom attr string>") and in code you would access the target via target.attrs["uma_attrs"]["attr1"], target.attrs["uma_attrs"]["attr2"], ect.?

manupak · 2022-05-17T10:21:27Z

So a user would only write tvm.target.Target("ultra_trail -uma_attrs=") and in code you would access the target via target.attrs["uma_attrs"]["attr1"], target.attrs["uma_attrs"]["attr2"], ect.?

More or less yes -- maybe we could (re)use "mattr" instead of "uma_attrs" looking at other target kinds -- but in principle that is what I meant.

areusch · 2022-05-18T19:30:19Z

ok for A1 i'm good with named phases and we can modify as necessary. i think the A2.2 solution of directly registering target attrs makes sense to me. is that the direction we're aligned on here?

we can discuss this next week at the community meeting, or if we're in alignment on these two items, i think all that remains is to update the RFC to reflect the discussion here and we can approve/merge.

mbs-octoml · 2022-05-18T22:16:02Z

Apologies for not following the conversation in detail in real time. Here are some thoughts on how we can make sure an UMA-integrated accelerator is also a Collage-supported 'backend'.

The registration of patterns will need to support the existing triple of (pattern name, pattern, predicate) since the predicates are necessary to control support based on dtypes, shapes, backend version, etc. No big deal.
I'm assuming those triples will continue to end up in either the global pattern table registry, or can be otherwise retrieved by a system like Collage which wishes to bypass the 'eager' UMA partitioning with it's own search. But again no big deal, just need to know where to look.
Though not significant to Collage, I assume the order of application of the partitioning patterns matches the registration order?
Collage requires external codegen compiler names to be 1:1 with already registered target kinds with the same kind name. It also requires instances of those targets to be provided in the build targets list, even if those instances are nothing other than Target("my_backend") with no extra attributes. But the target kinds may also support additional attributes, and the various transitions into external codegen code have been changed to ensure the matching Target instance has been pushed as the Target.current() so that codegen can retrieve and extract any attributes to guide compilation. I think that matches some of the conversation above, except that the attributes can be fetched by Target.current().get_attr("foo"), but I might have missed the point in that sub-thread.
Collage assumes a regular build of an IRModule will respect any existing "Compiler" attributed functions already in the module. I think all that means is that the UMA partitioner should respect existing partitions, but otherwise trigger the appropriate custom downstream compilation, and given the partitioner uses the existing passes I think that should all Just Work.
Collage assumes it can do it's partitioning before any other backend-specific passes. I'm assuming however that some of the Relay pass phases mentioned can be before partitioning. If so I'm guessing we'd need to first apply those pre-partitioning phases in deterministic order in the hope that they sensibly compose, then partition using Collage, then run the post-partitioning phases as usual.
Collage uses the list of available Targets to guide it's search, but if I understand correctly UMA uses the registration of backends to enforce a fixed partitioning order. Perhaps this suggests the Collage partitioner should be integrated as a user-controlled alternative to the default 'eager' partitoner supplied by UMA (presumably as a loop of the usual Relay MergeComposite/AnnotateTarget/MergeCompilerRegions?/PartitionGraph passes for each backend). That way the user can use the same construct-and-register-backends-of-interest API.
I'm surprised by the emphasis on going via TIR. Are we explicitly saying any BYOC integrations which don't need/want to go via TIR don't fall under the UMA integration API? If so that will make Collage/UMA integration harder since Collage would have to account for both UMA-style and original-style integrations.

Thanks,
-m

* Target registration with support for attribute options * Pass phases as enums

mbs-octoml · 2022-05-20T17:16:22Z

One more collage/uma overlap aspect: Collage distinguishes 'registered' backends (ie just TargetKinds) from 'activated' backends (ie Target objects in the provided build targets). I think though the proposal here is the act of registration is also activation? I need help understanding how this will look from the user's pov in combination with targets.

Update target registration and add pass phases

PaulPalomeroBernardo · 2022-05-23T10:14:12Z

Thanks @mbs-octoml for this detailed explanation. Being a Collage-supported backend is definitely something we want to achieve for UMA-integrated backends.

The registration of patterns will need to support the existing triple of (pattern name, pattern, predicate) since the predicates are necessary to control support based on dtypes, shapes, backend version, etc. No big deal.

We will add this to the pattern registration.

I'm assuming those triples will continue to end up in either the global pattern table registry, or can be otherwise retrieved by a system like Collage which wishes to bypass the 'eager' UMA partitioning with it's own search. But again no big deal, just need to know where to look.

They are registered in the global pattern table registry during backend registration but can also be accessed directly over the backend object if necessary.

Though not significant to Collage, I assume the order of application of the partitioning patterns matches the registration order?

Correct.

Collage requires external codegen compiler names to be 1:1 with already registered target kinds with the same kind name. It also requires instances of those targets to be provided in the build targets list, even if those instances are nothing other than Target("my_backend") with no extra attributes. But the target kinds may also support additional attributes, and the various transitions into external codegen code have been changed to ensure the matching Target instance has been pushed as the Target.current() so that codegen can retrieve and extract any attributes to guide compilation. I think that matches some of the conversation above, except that the attributes can be fetched by Target.current().get_attr("foo"), but I might have missed the point in that sub-thread.

I think, this works well. After the backend registration (e.g., UMABackend.register()) the target kind, which matches the required codegen compiler name, is available. From there, a target can be created (with or without attributes) and passed to the build target list.

Collage assumes a regular build of an IRModule will respect any existing "Compiler" attributed functions already in the module. I think all that means is that the UMA partitioner should respect existing partitions, but otherwise trigger the appropriate custom downstream compilation, and given the partitioner uses the existing passes I think that should all Just Work.

I agree.

Collage assumes it can do it's partitioning before any other backend-specific passes. I'm assuming however that some of the Relay pass phases mentioned can be before partitioning. If so I'm guessing we'd need to first apply those pre-partitioning phases in deterministic order in the hope that they sensibly compose, then partition using Collage, then run the post-partitioning phases as usual.

Yes, we were planning to include a pre-partitioning pass phase. Passes within one pass phase should always be executed in order of their registration.

Collage uses the list of available Targets to guide it's search, but if I understand correctly UMA uses the registration of backends to enforce a fixed partitioning order. Perhaps this suggests the Collage partitioner should be integrated as a user-controlled alternative to the default 'eager' partitoner supplied by UMA (presumably as a loop of the usual Relay MergeComposite/AnnotateTarget/MergeCompilerRegions?/PartitionGraph passes for each backend). That way the user can use the same construct-and-register-backends-of-interest API.

Currently a user needs to explicitly call partition() on the registered backend to perform the usual MergeComposite/AnnotateTarget/MergeCompilerRegions?/PartitionGraph passes plus the relevant relay pass phases (e.g., pre-partitioning).

backendA= MyUMABackendA()
backendB= MyUMABackendB()

backendA.register()
backendB.register()
mod = backendA.partition(mod)
mod = backendB.partition(mod)

As you described this would eagerly partition the graph depending on the call order of .partition(). This would actually give the user the opportunity to skip this partitioning and directly go for the Collage approach. I am not sure if this is the best solution though.

I'm surprised by the emphasis on going via TIR. Are we explicitly saying any BYOC integrations which don't need/want to go via TIR don't fall under the UMA integration API? If so that will make Collage/UMA integration harder since Collage would have to account for both UMA-style and original-style integrations.

As it is now, they would not fall under the UMA integration API. With UMA we wanted to wrap one specific BYOC integration into an easy-to-use interface and we decided to go with the target hooks via TIR (relay_to_tir, tir_to_runtime). However, if there is enough motivation we could think about adding relay_to_runtime as a second path. This would require greater changes to the current architecture so I don't see it as part of UMA v1 but we can take this into account for future development.

One more collage/uma overlap aspect: Collage distinguishes 'registered' backends (ie just TargetKinds) from 'activated' backends (ie Target objects in the provided build targets). I think though the proposal here is the act of registration is also activation? I need help understanding how this will look from the user's pov in combination with targets.

There are three steps required to make use of UMA as a user.

Create and instantiate a UMA backend backend = MyUMABackend()
Register the backend backend.register()
Apply the standard partitioning (might not be necessary with Collage)

backend.register() is registering the target kind, a pattern table, and global functions required by the UMA lowering. I think this is more or less equivalent with the Collage 'registration'. Only when the partitioning annotates a subgraph for the backend, it is 'activated'.

PaulPalomeroBernardo · 2022-05-23T10:54:18Z

ok for A1 i'm good with named phases and we can modify as necessary. i think the A2.2 solution of directly registering target attrs makes sense to me. is that the direction we're aligned on here?
we can discuss this next week at the community meeting, or if we're in alignment on these two items, i think all that remains is to update the RFC to reflect the discussion here and we can approve/merge.

@manupa-arm @areusch I think, we are aligned on this. We decided to go with the enum-based approach for A1 and use A2.2 for UMA v1. I updated the RFC accordingly (Pass Phases, Target Hooks).

manupak

Looks good to me (bar @mbs-octoml comments related to Collage).
Two nits to adjust the text but the design looks good to me.

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md

Fix code snippets

manupak

LGTM!

I ll let @areusch and @mbs-octoml to cover Collage related concerns here ?

mbs-octoml · 2022-05-23T18:00:47Z

backendA= MyUMABackendA()
backendB= MyUMABackendB()

backendA.register()
backendB.register()
mod = backendA.partition(mod)
mod = backendB.partition(mod)

Ah, that's the example I was missing (sorry!). After registration I think calling backend.partition or letting CollagePartition 'do it for you' seems like a free choice, and all we have to do is make sure Collage respects all the existing pass hooks (which, since I'm moving CUTLASS over the TargetHooks it has been forced to do anyway!).

As it is now, they would not fall under the UMA integration API.
Only when the partitioning annotates a subgraph for the backend, it is 'activated'.

Given above I don't think either of these points is an issue: Collage will pickup both 'low level' and 'UMA-style' integrations without prejudice. There may be a temptation from users to add compiler-configuration into the backend ctor, but it sounds like we agree we'll keep that in the Target object instances, in which case everything blends nicely.

So all LGTM from me, thanks for the extra explanation, and if any Collage-introduced friction shows up please just let me know and we can adjust mid-flight.

Best, -m

areusch

@PaulPalomeroBernardo @MichaelJKlaiber great! I think we are basically aligned here. I've asked for one more clarification given the discussion around use cases, then I think we can merge.

would you guys still like to discuss this on Wednesday? We have a few different topics, so I'm wondering if we may just need a brief 10 minutes to do an overview of the changes here? i think we can merge this RFC as soon as my one comment is addressed

areusch · 2022-05-23T18:11:02Z

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md

+    def target_name(self):
+        return "ultra_trail"
+```
+


could you guys add a brief example of how to use this once you implement this backend class?

This is an override of the abstract property defined in the base class UMABackend

@property @abstractmethod def target_name(self) -> str: """Name of the hardware target. Returns ------- out : str The hardware target name. """ ...

It's primarily used internally (e.g., target kind, target related global function names).

Hi @areusch, 10 mins to show the changes are fine.

@PaulPalomeroBernardo sorry i meant--can you add an example of how you might call tvm.relay.build() here, just so folks can understand it from user guide perspective?

@areusch Ahh right, so basically move this section up to the guide-level explanation? I think that makes sense

yes exactly! i think i'm not seeing that change otherwise i'd hit merge. can you ping again when that's done? and then i think we're good here

@areusch, done :)

Final changes for UMA-RFCv1

Move backend usage to guide-level explanation

areusch · 2022-06-01T15:28:28Z

thanks @MichaelJKlaiber @PaulPalomeroBernardo @cgerum and others! the RFC is now merged. Please open a tracking issue and link it from this thread for discoverability.

MichaelJKlaiber and others added 5 commits February 25, 2022 16:23

moving uma rfc from dicuss forum here

570e4de

uma rfc update

b593e11

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

716a40a

Update code snippets

Merge pull request #1 from PaulPalomeroBernardo/patch-1

013b3ec

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

Merge pull request #2 from MichaelJKlaiber/rfc_uma

61e0f49

Rfc uma

MichaelJKlaiber changed the title ~~Rfc uma~~ [RFC] UMA Universal Modular Accelerator Interface Mar 8, 2022

manupak requested changes Mar 10, 2022

View reviewed changes

manupak requested a review from Mousius March 10, 2022 11:22

cgerum and others added 2 commits March 14, 2022 17:28

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

8ffd21c

UMA: Update reference level and guide level

fee37a5

sunggg reviewed Mar 16, 2022

View reviewed changes

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved

MichaelJKlaiber and others added 4 commits March 16, 2022 10:43

uma:added _register_config reference and example

0cdbbef

uma: minor update

d943a5e

Merge pull request #4 from cgerum/rfc_uma

0f6aae3

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

Merge pull request #5 from MichaelJKlaiber/rfc_uma

0a68d04

Rfc uma

lhutton1 reviewed Mar 18, 2022

View reviewed changes

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved

MichaelJKlaiber added 2 commits March 18, 2022 18:55

uma-rfc: update to questions/comments added

06b5458

Merge pull request #6 from MichaelJKlaiber/rfc_uma

fcc56ca

uma-rfc: update to questions/comments added

PaulPalomeroBernardo and others added 5 commits April 1, 2022 11:30

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

0dda395

* Add descriptions for all API functions * Clarify backend registration and add target hook explanation * Remove schedules from API and corresponding descriptions

Merge pull request #7 from PaulPalomeroBernardo/patch-2

e7a490d

Update 00xx_UMA_Unified_Modular_Accelerator_Interface.md

Merge pull request #4 from boschresearch/rfc_uma

7542bf4

Rfc uma

uma_pipeline image update

9d7e6b5

uma_pipeline image update

b9a6cf3

Update target registration and add pass phases

bdedf0a

* Target registration with support for attribute options * Pass phases as enums

Merge pull request #10 from PaulPalomeroBernardo/patch-1

ac4dc76

Update target registration and add pass phases

manupak reviewed May 23, 2022

View reviewed changes

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved

rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md Outdated Show resolved Hide resolved

PaulPalomeroBernardo and others added 2 commits May 23, 2022 14:07

Fix code snippets

8a2731a

Merge pull request #11 from PaulPalomeroBernardo/patch-4

b559ca2

Fix code snippets

manupak approved these changes May 23, 2022

View reviewed changes

areusch reviewed May 23, 2022

View reviewed changes

MichaelJKlaiber and others added 6 commits May 25, 2022 07:16

[UMA] update

e591bbf

[UMA] update

fa9c273

[UMA] RFC PR added

87229f1

Merge pull request #12 from MichaelJKlaiber/rfc_uma

21d2849

Final changes for UMA-RFCv1

Move backend usage to guide-level explanation

a00d875

Merge pull request #13 from PaulPalomeroBernardo/patch-3

23c8ab4

Move backend usage to guide-level explanation

areusch approved these changes Jun 1, 2022

View reviewed changes

areusch merged commit 6990e13 into apache:main Jun 1, 2022

MichaelJKlaiber mentioned this pull request Jun 15, 2022

[Tracking Issue] UMA: Universal Modular Accelerator Interface apache/tvm#11260

Closed

21 tasks

[RFC] UMA Universal Modular Accelerator Interface #60

[RFC] UMA Universal Modular Accelerator Interface #60

Conversation

MichaelJKlaiber commented Mar 8, 2022

manupak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak Mar 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak Mar 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak Apr 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

areusch commented Mar 11, 2022

cgerum commented Mar 14, 2022

manupak commented Mar 15, 2022 • edited Loading

sunggg left a comment

Choose a reason for hiding this comment

cgerum commented Mar 16, 2022

lhutton1 left a comment

Choose a reason for hiding this comment

MichaelJKlaiber commented Mar 18, 2022

lhutton1 commented Mar 22, 2022

MichaelJKlaiber commented Mar 23, 2022

areusch commented May 11, 2022

PaulPalomeroBernardo commented May 13, 2022

cgerum commented May 16, 2022

MichaelJKlaiber commented May 17, 2022

manupak commented May 17, 2022

PaulPalomeroBernardo commented May 17, 2022

manupak commented May 17, 2022 • edited Loading

areusch commented May 18, 2022

mbs-octoml commented May 18, 2022

mbs-octoml commented May 20, 2022

PaulPalomeroBernardo commented May 23, 2022

PaulPalomeroBernardo commented May 23, 2022

manupak left a comment

Choose a reason for hiding this comment

manupak left a comment

Choose a reason for hiding this comment

mbs-octoml commented May 23, 2022

areusch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

areusch commented Jun 1, 2022

manupak Mar 15, 2022 •

edited

Loading

manupak Mar 17, 2022 •

edited

Loading

manupak Apr 6, 2022 •

edited

Loading

manupak commented Mar 15, 2022 •

edited

Loading

manupak commented May 17, 2022 •

edited

Loading