[DEV][TL] Support AMD Matrix Code Implementation #237

LeiWang1999 · 2024-11-07T10:35:32Z

This pull request includes significant changes to the bitblas package, primarily focusing on restructuring the initialization process and improving the layout functions. The most important changes include updating submodule references, refactoring the initialization script, and adding new layout functions.

Submodule Update:

Updated the tvm submodule to a new commit. (3rdparty/tvm)

Initialization Refactor:

Major refactoring of the bitblas/__init__.py file to streamline environment variable setup and module imports. This includes removing redundant code and reorganizing the import statements. [1] [2]

Logging Improvements:

Improved the logging setup by adjusting the formatter and ensuring consistent string formatting. (bitblas/__init__.py)

Import Path Updates:

Updated import paths in several files to reflect the new location of the mma_macro_generator module.
- bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore.py
- bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore_s4.py
- bitblas/ops/general_matmul/tilelang/dequantize/finegrained_primitive_tensorcore.py
- bitblas/ops/general_matmul/tilelang/dequantize/finegrained_primitive_tensorcore_s4.py
- bitblas/ops/general_matmul/tilelang/dequantize/ladder_weight_transform_tensorcore.py
- bitblas/ops/general_matmul/tilelang/dequantize/ladder_weight_transform_tensorcore_s4.py
- bitblas/tl/__init__.py

New Layout Functions:

Added new layout functions for shared to local memory mapping in bitblas/tl/base_layout.py and bitblas/tl/mfma_layout.py. These functions facilitate efficient memory access patterns for tensor operations.
- bitblas/tl/base_layout.py
- bitblas/tl/mfma_layout.py

LeiWang1999 added 23 commits October 16, 2024 19:14

Refactor Simplify function to handle multiple functions in IRModule

c4853ec

Update submodule commit reference

9a21acf

Add CUDA_DEVICE_ORDER environment variable to bashrc

f8d046b

test fix

c1371dd

lint fix

416cad2

Refactor test_general_matmul_bf16.py to use bitblas.testing.main()

9209d1e

Update submodule commit reference

1cf7570

Update Ubuntu version in install scripts based on LLVM version

5fec040

Update Ubuntu version in install scripts based on LLVM version

4e1a0d2

Update submodule commit reference

fa85f8c

Update submodule commit reference

429d5b5

Update submodule commit reference

4003509

Merge branch 'main' of https://github.com/microsoft/BitBLAS into amd_hip

1d86582

Update submodule commit reference

df3af0d

Merge branch 'main' of https://github.com/microsoft/BitBLAS into amd_hip

1f1e027

Update submodule commit reference

732dda6

Merge branch 'main' of https://github.com/microsoft/BitBLAS into amd_hip

ebffbfa

Merge branch 'main' of https://github.com/microsoft/BitBLAS into amd_hip

ff227fa

[Dev] Update subproject commit for TVM

ac62936

ignore profiler directories.

a7a239c

MFMA Support

dcedbde

lint fix

e0b36f5

Merge branch 'main' of https://github.com/microsoft/BitBLAS into amd_hip

fe668f9

LeiWang1999 merged commit ad19317 into microsoft:main Nov 7, 2024
3 of 4 checks passed

LeiWang1999 mentioned this pull request Nov 8, 2024

does BitBLAS suport ROCM/AMD GPUS #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEV][TL] Support AMD Matrix Code Implementation #237

[DEV][TL] Support AMD Matrix Code Implementation #237

LeiWang1999 commented Nov 7, 2024

[DEV][TL] Support AMD Matrix Code Implementation #237

[DEV][TL] Support AMD Matrix Code Implementation #237

Conversation

LeiWang1999 commented Nov 7, 2024

Submodule Update:

Initialization Refactor:

Logging Improvements:

Import Path Updates:

New Layout Functions: