[Dev] Convert the quant compress from numpy into tvm runtime #126

LeiWang1999 · 2024-08-05T05:17:06Z

By introducing the weight propagation stage3 with pr #114 , the weight transform could be bit-level when the weight is 1/2 bits. It's time for us to implement a tvm version of compress thus we can do the conversion on the unquantized weight then we can avoid the bit level permutation.

This pull request includes several important changes to the bitblas/gpu/intrin/lop3.py file to enhance the decoding functions, as well as a minor update to the CI workflow configuration in .github/workflows/benchmark.yml and a submodule update in 3rdparty/tvm.

Enhancements to Decoding Functions:

New Decoding Functions: Added multiple new template functions for decoding various integer formats to float16 with scaling and offset capabilities. (bitblas/gpu/intrin/lop3.py) [1] [2] [3] [4]
Function Argument Handling: Introduced a helper function get_func_arguments to streamline the passing of arguments to external functions. (bitblas/gpu/intrin/lop3.py)
Offset Factor: Added offset_factor to buffer definitions to support the new decoding functions. (bitblas/gpu/intrin/lop3.py) [1] [2] [3] [4] [5] [6]
Function Calls: Updated function calls to use the new get_func_arguments helper for improved readability and maintainability. (bitblas/gpu/intrin/lop3.py) [1] [2] [3] [4]

CI Workflow Update:

Dependency Update: Changed depends-on to needs in the CI workflow configuration to improve dependency management. (.github/workflows/benchmark.yml)

Submodule Update:

Submodule Commit: Updated the submodule commit for 3rdparty/tvm to a new version. (3rdparty/tvm)

…ability and maintainability

…ainability

…tainability

…ility

LeiWang1999 added 30 commits July 5, 2024 08:54

Refactor BatchMatMulEmitter and BatchMatMulSelector for improved read…

d8884e6

…ability and maintainability

Refactor import statements for improved readability and maintainability

fc84173

Refactor import statements for improved readability and maintainability

02f64de

disable failure email for ci

397eee6

remove email notifications.

20f6ad1

move relax pass from testing to mlc_llm

b93c394

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into main

ba6a6df

Refactor scripts with se check_eual_ref_scripts_with_emitter function

257693a

Lint Fix

9bb7f49

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into main

39e7614

Refactor scripts with se check_eual_ref_scripts_with_emitter function

93eb5a5

bug fix in test

aa66a90

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

ae14a53

lint fix.

79b08e4

test cuda i4 kernel

86fd036

Refactor copyright notice in i4matmul.hpp

6b73a21

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

0ba90c1

Refactor BitBLASLinear test module for improved readability and maint…

086d208

…ainability

refactor test as version below python 3.9 cannot handle int32 overflow.

47a3abd

format lint for test

024b247

Refactor test_int4b_fp16_convert.py for improved readability and main…

bfedeaa

…tainability

remove unused design file

e672a23

move tile device from package to base

21e5430

dummy impl for codegen

fd11940

Refactor file structure for ladder_permutate module

9ccfa85

Refactor backend class and fix typos in comments

7c7d73e

Deep refactor Lib related code.

47d5fc5

remove ci pull.

53dd0dd

LintFix

d58ac43

refactor builder for whl build

37cb07c

LeiWang1999 added 28 commits July 31, 2024 11:24

implement propagate func

d339037

Stage3 Ladder Permutate integration

0f6a033

get_ladder_stage3_propagate

00ec916

comments benchmark scirpts as the setting is too big

5316577

ci fix for benchmark

dd070f9

lint fix

6fcc368

chore: Update benchmark workflow to trigger on pull request comments

705580b

Add LDMatrix Transform 3

c5ba940

Support GPTQ Test

1566990

Fuse BlockReduce Schedule

c6c70ef

Support mma propagate 3

36128f3

Support MMA Propagate Stage 3

23ff5f4

Lint Fix

de3bf08

Merge block reduce for dequantze config.

d9830ba

fix codeql

e5a4485

chore: Update submodule reference to latest commit

a04282b

chore: Disable common subexpression elimination in TIR passes

314d3e9

Lint Fix

f7d33bb

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

db633ed

4bit related lop3 updates.

201155a

lint fix

2b73662

gptq test fix

1a6a0fd

Fix for test

e84e3ef

lint fix

f0fbb55

lint fix

bf30688

typofix

9a360ba

QuantCompress Test

ee94536

chore: Refactor quant_compress_impl.py for readability and maintainab…

930cd76

…ility

LeiWang1999 merged commit 906055d into microsoft:main Aug 5, 2024
6 checks passed

LeiWang1999 mentioned this pull request Aug 5, 2024

[Dev] Refactor the weight transformation to support upcoming stage3 transform #130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dev] Convert the quant compress from numpy into tvm runtime #126

[Dev] Convert the quant compress from numpy into tvm runtime #126

LeiWang1999 commented Aug 5, 2024

[Dev] Convert the quant compress from numpy into tvm runtime #126

[Dev] Convert the quant compress from numpy into tvm runtime #126

Conversation

LeiWang1999 commented Aug 5, 2024

Enhancements to Decoding Functions:

CI Workflow Update:

Submodule Update: