[AMD][TL] Introduce K Pack and a Conflict Free swizzling into Matrix Core #248

LeiWang1999 · 2024-11-24T13:49:57Z

This pull request includes several changes to the bitblas library, focusing on improving the swizzle layout functionality and adding new methods for matrix operations. The most important changes include updating the swizzle layout methods, adding new layout transformation functions, and enhancing the matrix core intrinsic emitter.

Swizzle Layout Updates:

bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore.py: Updated the import to use make_mma_swizzle_layout instead of make_swizzle_layout.
bitblas/tl/mma_layout.py: Added new functions for swizzle layout transformations, including get_swizzle_layout and make_mma_swizzle_layout. [1] [2]

New Layout Transformation Functions:

bitblas/tl/mfma_layout.py: Added new functions for shared memory to local memory layout transformations, such as thread_id_shared_access_64x8_to_16x32_layout_A and shared_16x32_to_local_64x8_layout_A.
bitblas/tl/mfma_macro_generator.py: Incorporated new layout transformation functions into the MatrixCoreIntrinEmitter class and updated the get_ldmatrix_index_map method to support new layouts. [1] [2]

Enhancements to Matrix Core Intrinsic Emitter:

bitblas/tl/mfma_macro_generator.py: Enhanced the MatrixCoreIntrinEmitter class by adding new parameters k_pack and is_m_first, and updating methods to utilize these parameters. [1] [2]
bitblas/tl/mfma_macro_generator.py: Updated the ldmatrix_a, ldmatrix_b, and mfma methods to handle the new layout transformations and parameters. [1] [2] [3]

These changes improve the flexibility and performance of the matrix operations within the bitblas library by optimizing memory layout transformations and enhancing the matrix core intrinsic emitter.

TODO Items

Warp with Block Primitives
Block Level Test Case
Documentation for this optimizations

LeiWang1999 added 7 commits November 24, 2024 13:08

Implemeng MFMA Make Swizzle Layout

5d977ab

Implement Test

6532ab0

format code

fc3185e

test fix

3760a5d

submodule update

fcc1cbd

implement block level test

5f10c0c

lint fix

d7c5f7a

LeiWang1999 merged commit 6f9c6ed into microsoft:main Nov 27, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][TL] Introduce K Pack and a Conflict Free swizzling into Matrix Core #248

[AMD][TL] Introduce K Pack and a Conflict Free swizzling into Matrix Core #248

LeiWang1999 commented Nov 24, 2024 •

edited

Loading

[AMD][TL] Introduce K Pack and a Conflict Free swizzling into Matrix Core #248

[AMD][TL] Introduce K Pack and a Conflict Free swizzling into Matrix Core #248

Conversation

LeiWang1999 commented Nov 24, 2024 • edited Loading

Swizzle Layout Updates:

New Layout Transformation Functions:

Enhancements to Matrix Core Intrinsic Emitter:

TODO Items

LeiWang1999 commented Nov 24, 2024 •

edited

Loading