[TIR] Support more mma intrinsics and `get_mma_intrin_group` utility #16073

Ubospica · 2023-11-05T06:12:56Z

This PR focuses on adding supporting various mma intrinsics for matmul scheduling. To be specific, this PR:

Adds support for transposed A in ldmatrix and mma_sync
Changes all T.launch_thread(tx, 32) annotations to for loops for tx in T.thread_binding(0, WARP_SIZE, "threadIdx.x"). This is for the convenience of later transformation.
Refactor some logic and add a utility get_mma_intrin_group to get a group of intrinsics:

def get_mma_intrin_group(
    load_scope: Literal["shared", "shared.dyn"],
    store_scope: Literal["global", "shared", "shared.dyn"],
    in_dtype: Literal["float16", "int8"],
    out_dtype: Literal["float16", "float32", "int32"],
    trans_a: bool,
    trans_b: bool,
    not_use_mma_store_intrinic: bool = True,
    store_to_smem_dtype: Optional[Literal["float16", "float32", "int32"]] = None,
) -> Dict[str, str]

Avoid use the current mma_store intrinsic. Instread, use BufferStore statements.
- This is because if we use mma_store intrinsic, during swizzling shared memory visits, our rearrangement scheme will involve areas accessed by different mma_store calls. This makes swizzling quite complex. But BufferStore will not face this problem

This PR is used and tested in the dlight matmul schedule rule.

@spectrometerHBH @vinx13 @Hzfengsy

…pache#16073) * 1104 * 1104 * 1105 * fix ci * fix ci

Ubospica added 2 commits November 5, 2023 05:45

1104

74c4e5f

1104

fbccbe9

Ubospica changed the title ~~[TIR] Support more intrinsics and get_mma_intrin_group utility~~ [TIR] Support more mma intrinsics and get_mma_intrin_group utility Nov 5, 2023

Ubospica added 2 commits November 5, 2023 19:37

1105

78838cc

fix ci

1765017

vinx13 approved these changes Nov 6, 2023

View reviewed changes

fix ci

466432b

vinx13 merged commit db4290b into apache:main Nov 7, 2023
5 checks passed

Ubospica added a commit to Ubospica/tvm-develop that referenced this pull request Nov 13, 2023

[TIR] Support more mma intrinsics and get_mma_intrin_group utility (a…

606f121

…pache#16073) * 1104 * 1104 * 1105 * fix ci * fix ci

ysh329 mentioned this pull request Jan 11, 2024

[Release] v0.15.0 Release Candidate Notes #16391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] Support more mma intrinsics and `get_mma_intrin_group` utility #16073

[TIR] Support more mma intrinsics and `get_mma_intrin_group` utility #16073

Ubospica commented Nov 5, 2023 •

edited

Loading

[TIR] Support more mma intrinsics and get_mma_intrin_group utility #16073

[TIR] Support more mma intrinsics and get_mma_intrin_group utility #16073

Conversation

Ubospica commented Nov 5, 2023 • edited Loading

[TIR] Support more mma intrinsics and `get_mma_intrin_group` utility #16073

[TIR] Support more mma intrinsics and `get_mma_intrin_group` utility #16073

Ubospica commented Nov 5, 2023 •

edited

Loading