[AMD][TL] Introduce K Pack and a Conflict Free swizzling into Matrix Core #248
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request includes several changes to the
bitblas
library, focusing on improving the swizzle layout functionality and adding new methods for matrix operations. The most important changes include updating the swizzle layout methods, adding new layout transformation functions, and enhancing the matrix core intrinsic emitter.Swizzle Layout Updates:
bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore.py
: Updated the import to usemake_mma_swizzle_layout
instead ofmake_swizzle_layout
.bitblas/tl/mma_layout.py
: Added new functions for swizzle layout transformations, includingget_swizzle_layout
andmake_mma_swizzle_layout
. [1] [2]New Layout Transformation Functions:
bitblas/tl/mfma_layout.py
: Added new functions for shared memory to local memory layout transformations, such asthread_id_shared_access_64x8_to_16x32_layout_A
andshared_16x32_to_local_64x8_layout_A
.bitblas/tl/mfma_macro_generator.py
: Incorporated new layout transformation functions into theMatrixCoreIntrinEmitter
class and updated theget_ldmatrix_index_map
method to support new layouts. [1] [2]Enhancements to Matrix Core Intrinsic Emitter:
bitblas/tl/mfma_macro_generator.py
: Enhanced theMatrixCoreIntrinEmitter
class by adding new parametersk_pack
andis_m_first
, and updating methods to utilize these parameters. [1] [2]bitblas/tl/mfma_macro_generator.py
: Updated theldmatrix_a
,ldmatrix_b
, andmfma
methods to handle the new layout transformations and parameters. [1] [2] [3]These changes improve the flexibility and performance of the matrix operations within the
bitblas
library by optimizing memory layout transformations and enhancing the matrix core intrinsic emitter.TODO Items