Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Target Pre-processing RFC #71

Merged
merged 4 commits into from
Aug 2, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 208 additions & 0 deletions rfcs/0070-target-preprocessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
- Feature Name: target-architecture-preprocessor
- Start Date: 2022-04-04
- RFC PR: [apache/tvm-rfcs#0070](https://github.com/apache/tvm-rfcs/pull/0000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: The RFC link doesn't point to the discussion PR. #70 points to the DeclBuffer RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😿

- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)

# Summary
[summary]: #summary
Provide a standard and easily testable way to inspect architecture extensions and provide them to the various parts of TVM which utilise that information.

# Motivation
[motivation]: #motivation
TVM has multiple ways to define a `Target`s architectural features for use in deciding on schedules or other calculations, here's a few different ways we do this:

* CPU to Feature Mapping: https://github.com/apache/tvm/blob/d2db9cb0d839e32778f461b77e59f6418282a511/python/tvm/target/arm_isa.py#L22-L39
* Inspecting `Target` in utility functions: https://github.com/apache/tvm/blob/d2db9cb0d839e32778f461b77e59f6418282a511/python/tvm/topi/arm_cpu/arm_utils.py#L24-L70
* Inspecting `Target` in utility functions inside legalization code: https://github.com/apache/tvm/blob/02fbaf0ed9120a8f95155e63de42459f230584aa/python/tvm/relay/qnn/op/legalizations.py#L350-L359
* Inspecting `Target` inside the definition a strategy: https://github.com/apache/tvm/blob/b542724873140bb051492530d97a78b9b7b7983d/python/tvm/relay/op/strategy/arm_cpu.py#L232
* Processing bespoke Compiler arguments: https://github.com/apache/tvm/blob/d2db9cb0d839e32778f461b77e59f6418282a511/src/relay/backend/contrib/cmsisnn/compiler_attrs.cc#L47-L70
* Registered as a `PackedFunc` (https://github.com/apache/tvm/blob/24e5498021cecca2fe7d44149ce90efe28b6d930/python/tvm/topi/x86/utils.py#L21-L34) and then used as part of `Op` processing: https://github.com/apache/tvm/blob/24e5498021cecca2fe7d44149ce90efe28b6d930/src/relay/qnn/op/requantize_config.h#L58-L73

This RFC aims to standardise the way in which we convert `Target` attributes into architectural features by processing them ahead of time.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Two additional pre-processors can be added to the `Target`, for users to preprocess architectural information when the `Target` is created:
* Architecture Pre-processing - maps `Target` `attrs` to a new `arch` object
* Keys Pre-processing - maps `Target` `attrs` and `keys` to a new set of `keys`

These new preprocessors will be illustrated using examples targeting TVM for Arm(R) Cortex(R)-M4.

## Architecture Pre-processing
```c++
Target("c")
.set_arch_preprocessor(MyArchPreprocessor)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If arch preprocessor is attached to Target, then this precludes serializing targets into a string. Maybe we should just allow subclassing of targets?

Also, here the target is already created, so it should be something like apply_arch_preprocessor...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embarrassingly, you have found a typo, this should be TVM_REGISTER_TARGET_KIND, which should make it a lot clearer how this is meant to work 😸

When a Target is parsed, the preprocessor will re-process the arguments and therefore re-production from a string should result new a new Target with the pre-processed information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, does this means that the only the first round trip of str -> Target -> str is allowed to introduce changes? That is, round trips of Target -> str -> Target will reproduce the original Target, and round trips of str -> Target -> str are idempotent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str -> Target -> str continues to function as it does today, where the AttrPreprocessor may introduce changes in attributes, the KeysPreprocessor would also behave this way. Theoretically, if the logic doesn't flip flop between values that should mean that the first change from str -> Target creates all the values and you're correct that future transformations should be idempotent (I can't see a reason the logic would flip flop, but it is possible).

The ArchPreprocessor does not hold this property as the arch internal property is repopulated from attrs on instantiation.

```

This takes the `attrs` from `Target` and converts them into an object representing the architectural features of the `Target`, which can then be accessed using the `GetArch` method similar to `GetAttr`:

```c++
Target my_target("c -mcpu=cortex-m4");
my_target->GetArch<Bool>("is_aarch64", false); // false
my_target->GetArch<Bool>("has_dsp", false); // true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What should determine whether a parameter is defined in attrs or arch?

  • How can a Target override the values used for an architecture?

    • With the current design, it looks like this would be impossible. Because only the FTVMKeysPreprocessor takes the arch as input, any two it looks like the answer is no.

    • I think this would be a useful feature in order to selectively disable features for debugging and performance comparison. Alternatively, we'd get this same benefit if GetAttr is allowed to delegate to GetArch for undefined values, and and GetArch is only used internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main distinction is the architecture features are inferred from the Target attrs, if we want users to pass it in then it would need to be in attrs as you're correct there's no way of modifying it from a users point of view.

  • I think this would be a useful feature in order to selectively disable features for debugging and performance comparison. Alternatively, we'd get this same benefit if GetAttr is allowed to delegate to GetArch for undefined values, and and GetArch is only used internally.

I referenced this a bit here:
https://github.com/apache/tvm-rfcs/pull/71/files#diff-c04c196e8069297e1fcfdd5a514a268b971943b265649eaa41edd8d47052d56aR174-R182

I'd rather keep Target attrs separated out from arch so as not to confuse the Target API by having too many things accessible as direct properties of it?

```

```python
my_target = Target("c -mcpu=cortex-m4")
my_target.arch.is_aarch64 // false
my_target.arch.has_dsp // true
```

## Keys Pre-processing

```c++
Target("c")
.set_keys_preprocessor(MyKeysPreprocessor)
```

This takes the `attrs` from `Target` and maps them to relevant `keys` for use when selecting schedules:

```c++
Target my_target("c -mcpu=cortex-m4");
my_target->keys; // ["arm_cpu", "cpu"] <-- "cpu" is taken from default keys and merged by the pre-preprocessor
```

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Currently, there is a single `preprocessor` which takes an input of `attrs` and expects the same `attrs` returned with pre-processing applied:

https://github.com/apache/tvm/blob/d2db9cb0d839e32778f461b77e59f6418282a511/src/target/target.cc#L810-L814

In extension to this, a series of new pre-processors will be defined:

```c++
using TargetAttrs = Map<String, ObjectRef>;
using TargetArch = Map<String, ObjectRef>;
using TargetKeys = Array<String>;

using FTVMAttrPreprocessor = runtime::TypedPackedFunc<TargetAttrs(TargetAttrs)>;
using FTVMArchPreprocessor = runtime::TypedPackedFunc<TargetArch(TargetAttrs)>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function signature wouldn't allow the TargetAttrs to be modified as part of the FTVMArchPreprocessor. If we're recognizing the architecture based on a specified "arch" or "mcpu" attribute, I think we should also modify the attributes to no longer contain the "arch" or "mcpu" attributes. Otherwise, we have the same information in two different locations, which allows discrepancies between the two representations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the breadth of TVM, I don't believe we can adequately clamp this to a specific set of input attributes?

Also, I was trying to not mutate the input attrs in favour of providing the new augmented object, so when you specify my_target = Target("c -mcpu=woofles") you can reliably expect my_target.mcpu to still provide the expected answer. Once processed, .arch would give you a more detailed set of information about the combination of input attrs rather than attempting to replace them?

using FTVMKeysPreprocessor = runtime::TypedPackedFunc<TargetKeys(TargetAttrs, TargetKeys)>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the keys preprocessor is primarily used to define properties based on the architecture determined. It looks like this has significant overlap with the target tags. When should this be used instead of tags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't see the overlap, taking the example of:

TVM_REGISTER_TARGET_TAG("raspberry-pi/4b-aarch64")
    .set_config({{"kind", String("llvm")},
                 {"mtriple", String("aarch64-linux-gnu")},
                 {"mcpu", String("cortex-a72")},
                 {"mattr", Array<String>{"+neon"}},
                 {"num-cores", Integer(4)},
                 {"host", Map<String, ObjectRef>{{"kind", String("llvm")},
                                                 {"mtriple", String("aarch64-linux-gnu")},
                                                 {"mcpu", String("cortex-a72")},
                                                 {"mattr", Array<String>{"+neon"}},
                                                 {"num-cores", Integer(4)}}}});

These are pre-configured Targets with various mtriple, mcpu and mattr attributes already set - once pre-processed these can produce a set of architecture features for subsequent steps, such as replacing this check in the operator strategy:

https://github.com/apache/tvm/blob/f88a10fb00419c51a116a63f931a98d8286b23de/python/tvm/relay/op/strategy/arm_cpu.py#L232-L245

Other tagged Targets will likely have the same mattr and mcpu, thus rather than trying to hand craft the permutations each time, the pre-processor will 😸

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also entirely possible that I'm misunderstanding. I had been both features as a way to define a bag of properties associated with some short key. For the tags the short key is the name of the tag, and for the new preprocessor the short key is the architecture name.

```

These implementations can be stored under `src/target/preprocessors/<arch_identifier>.{cc.h}` to allow them to be composed together such as:

* src/target/preprocessors/aarch64.cc
* src/target/preprocessors/cpu.cc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd get rid of this section---it sounds like a recommendation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we not want to try and ensure these are consistently organised in the codebase?


Where the `cpu` pre-processor can utilise the `aarch64` pre-processor if detected.

## Rename Attr Preprocessor
To help avoid confusion between the existing `attrs` `preprocessor` and the new pre-processors, the `attrs` pre-processor will be renamed from `preprocessor` to `attr_preprocessor`:

```c++
class TargetKind {
...
FTVMAttrPreprocessor attr_preprocessor;

...
}
```

## Architecture Preprocessor
The first new pre-processor, which processes `attrs` in to an `arch` object, is registered as a new field is added to `TargetKind`:

```c++
class TargetKind {
...
FTVMArchPreprocessor arch_preprocessor;
Copy link

@kparzysz-quic kparzysz-quic May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a part of TargetKind, then you cannot customize it per specific architecture. For example, both x86-64 and aarch64 have the same kind: "llvm", but they will need different preprocessors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'll end up with something similar to the example in clang (https://github.com/llvm/llvm-project/blob/2f04e703bff3d9858f53225fa7c780b240c3e247/clang/lib/Driver/ToolChains/Clang.cpp#L331-L343) where the actual parser can take different routes dependent on more specific hints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@kparzysz-quic kparzysz-quic May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang driver takes flags given by the user and translates them into specific "subtarget" configuration. There is a flag -target <triple>, which roughly corresponds to a TargetMachine, and then there are other details (implicit or explicit) that eventually create TargetSubtargetInfo. That part of clang allows each target to specify the exact features, having already established what that target is. This is similar, for example, to how target kind "rocm" has attributes specific to the ROCm family, the key here being that we know we're dealing with ROCm, and not with any generic GPU.

The target kind "llvm", on the other hand, does correspond to any generic CPU, be it x86 or Arm. Flags specific to x86 may not be applicable to Arm, and vice versa. If we attach every arch preprocessor to kind "llvm", then whatever code executes them will have to know which preprocessor to apply, and currently there is no information that would aid with this selection.

This goes beyond the scope of this RFC, but my vote would be to eliminate "llvm" as a target kind, and replace it with specific kinds of CPUs. We could have target "host" that would correspond to the compilation host (and can be automatically translated to an appropriate specific CPU kind).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang driver takes flags given by the user and translates them into specific "subtarget" configuration. There is a flag -target , which roughly corresponds to a TargetMachine, and then there are other details (implicit or explicit) that eventually create TargetSubtargetInfo.

Apologies, I oversimplified, you're right that Clang/LLVM has a few more data structures to describe this (including SubTargets) - in this RFC we're essentially using Map<String, ObjectRef> as a replacement for most of the SubTarget in the .arch property. I can add further details to the Clang/LLVM Prior Art to reflect this 😸

The target kind "llvm", on the other hand, does correspond to any generic CPU, be it x86 or Arm. Flags specific to x86 may not be applicable to Arm, and vice versa. If we attach every arch preprocessor to kind "llvm", then whatever code executes them will have to know which preprocessor to apply, and currently there is no information that would aid with this selection.

Sorry, I'm not understanding so I'll provide an example 😸 , in the case of LLVM we'd attach a pre-processor which can inspect the attributes and do something similar to:

Map<String, ObjectRef> llvm_preprocessor(...attrs...) {
  bool is_aarch64 = ...logic ...;
  if (is_aarch64) {
     return aarch64_preprocessor(...attrs...);
  }
}

Therefore, whenever you instantiate a Target("llvm -march=aarch64-woof-woof-woof") the preprocessor would give you a target.arch Map which would be pre-processed using the aarch64 processor, similar to the snippet I pasted from clang with the switch logic. This does mean we have to create a preprocessor for both the llvm and c Targets to map mcpu, march, mtune or mtriple to the relevant logic path in the same way that LLVM suggests processing march and mcpu for SubTargetInfo - this preprocessor would have the freedom to generate any eventual Map<String, ObjectRef> for the specific details of a given architecture which is being targeted by the llvm Target of TVM.

This goes beyond the scope of this RFC, but my vote would be to eliminate "llvm" as a target kind, and replace it with specific kinds of CPUs. We could have target "host" that would correspond to the compilation host (and can be automatically translated to an appropriate specific CPU kind).

The Target is definitely a difficult problem to unpick, so glad you agree it's out of scope 😸 , personally I would avoid magic Targets such as "composite" and "host" in favour of better defining them - llvm is maybe just a code generator for however we describe a CPU Target? Potentially worth starting an RFC to get the discussion going and then discuss in the community meetup?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at your example: set_arch_preprocessor(MyArchPreprocessor), what arch would this apply to? If I need to add code to the llvm_preprocessor above to recognize MyArch, and then call MyArchPreprocessor, then there is no point in having the set_arch_preprocessor in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was that MyArchPreprocessor would be the CPU preprocessor which invokes other preprocessors when it decides they're appropriate. So you'd have:

Map<String, ObjectRef> AArch64PreProcessor(...attrs...) {
  return ...things...;
}

Map<String, ObjectRef> CPUPreprocessor(...attrs...) {
  bool is_aarch64 = ...logic ...;
  if (is_aarch64) {
     return AArch64PreProcessor(...attrs...);
  }
}

TVM_REGISTER_TARGET_KIND("llvm", kDLCPU)
    .set_arch_preprocessor(CPUPreprocessor);

Which then means when you invoke:

my_target = Target("llvm -mtriple=aarch64-woof-woof");

You can inspect it as:

my_target.arch.is_aarch64 // True

This then allows us to add multiple preprocessors to the llvm Target via the single CPU preprocessor entrypoint.

Copy link

@kparzysz-quic kparzysz-quic May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be equivalent to adding arch as an attribute, and setting it using the attribute preprocessor, wouldn't it? The difference is that with your proposal we don't access arch as an attribute, but as a member.

Edit: Actually attrs is a (public) member in Target, it's just handled by helper functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my_target.arch.is_aarch64 // True

I think it would be my_target.arch["is_aarch64"].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll come out as my_target.arch.is_aarch64 due to using __getattr__ here to prettify the attributes:
https://github.com/apache/tvm-rfcs/pull/71/files#diff-c04c196e8069297e1fcfdd5a514a268b971943b265649eaa41edd8d47052d56aR142-R150


...
}
```

This pre-processes `Target` attributes into a new field on `Target` called `arch`:
```c++
class Target {
...
DictAttrs arch;

...
}
```

Which will have similar helper methods to those seen in `IRModule` for `DictAttrs` but with reference to `Arch` rather than `Attr`:

```c++
template <typename TObjectRef>
Optional<TObjectRef> GetArch(
const std::string& attr_key,
Optional<TObjectRef> default_value = Optional<TObjectRef>(nullptr)) const {
return attrs.GetAttr(attr_key, default_value);
}
template <typename TObjectRef>
Optional<TObjectRef> GetArch(const std::string& attr_key, TObjectRef default_value) const {
return GetArch<TObjectRef>(attr_key, Optional<TObjectRef>(default_value));
}
```

As well as a Python class to represent this and allow simple access:
```python
class TargetArch {
def __init__(self, target):
self._target = target

def __getattr__(self, name):
return _ffi_api.TargetGetArch(self._target, name)
}
```

## Key Preprocessor
The second new pre-processor will populate the `keys` fields from the initial `Target` `attrs` and existing `keys`, it simply requires an additional field on `TargetKind`:

```c++
class TargetKind {
...
FTVMKeysPreprocessor keys_preprocessor;

...
}
```

As the signature of the pre-processor passes the existing keys into the `keys_preprocessor` it is responsible for merging them or removing them if necessary.

# Drawbacks
[drawbacks]: #drawbacks

By adding these new pre-processing options to `Target` we increase the amount of work incurred when instantiating a `Target`, it was ultimately considered that this one-time cost would be similar to repeatedly querying the `Target` attributes.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

## Re-use Target Attributes
Instead of creating a new field (`arch`), instead extend `Target` attributes with preprocessed results such that you can do:

```python
my_target = Target("c -mcpu=cortex-m4")
my_target.is_aarch64 # Extra attribute in `attrs`
```

It was felt this would become confusing to work with alongside the documented `Target` attributes in `target_kind.cc` or `target_kind.cc` would need to be bloated with every potential architecture field. The approach of overlapping with `Target` attributes would also increase testing overhead rather than having a straight forward `attrs` to `arch` mapping to test.

## Extend Utility Functions
Using a standalone function or class across the various areas of the codebase, such as:

```
TargetArch my_target_arch(target)
my_target_arch->is_aarch64; // false
```

This means re-processing `Target` whenever a specific attribute is required but would provide a single source of truth for doing so.

# Prior art
[prior-art]: #prior-art

* This follows the original Target Specification RFC: https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844
* Pre-processor definitions follow the pattern set out in Target Hooks: https://github.com/apache/tvm-rfcs/blob/main/rfcs/0010-target-registered-compiler-flow-customisation.md
* LLVM applies similar processing steps, using defined tables, for example: https://github.com/llvm/llvm-project/blob/7dddf12f448d7ed7e2e35a4de69b53bd140f12c0/llvm/lib/Target/ARM/ARM.td
* GCC applies similar processing steps, using defined tables, for example: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/config/arm/arm-cpus.in

# Unresolved questions
[unresolved-questions]: #unresolved-questions

# Future possibilities
[future-possibilities]: #future-possibilities

Similar to LLVM and GCC, we may be able to use a custom file format to describe `Target`s more effectively in future which can be added using the same hooks, allowing for easier contributions.