FEAT Adding experimental feature : Triton mm int8xint2 #195

MekkCyber · 2024-09-03T08:42:58Z

Summary

Introducing matrix multiplication int8xint2 in Triton as an experimental feature. This approach involves performing matmul with on-the-fly unpacking, utilizing cached tiling techniques. Currently, it leverages tl.dot with int8 values, which is the most optimized method available at this time. However, with future hardware advancements, this could become significantly more efficient, particularly when using ternary weights, potentially eliminating the need for multiplication altogether.

ByronHsu

awesome work!

ByronHsu · 2024-09-03T15:51:15Z

src/liger_kernel/ops/experimental/mm_int8int2.py

+    key=['M', 'N', 'K'],
+)
+@triton.jit
+def matmul_kernel(


add a paper reference and more comments. it is a bit hard to understand the current code

Yes I will add some comments to explain the process when i have some time

claude / chatgpt can be helpful

ByronHsu · 2024-09-03T15:51:25Z

src/liger_kernel/ops/experimental/mm_int8int2.py

+    return c
+
+
+def test_kernel(size=2048) : 


move this to tests/

ByronHsu · 2024-09-03T16:00:24Z

src/liger_kernel/ops/experimental/mm_int8int2.py

+    return packed
+
+
+def get_cuda_autotune_config():


should not be cuda?

Sorry, I didn't understand the comment

i mean triton and cuda are different things. maybe replace cuda with triton

ByronHsu · 2024-09-04T16:57:34Z

good to merge if we can add more comments / ref to explain

ByronHsu · 2024-09-04T17:04:26Z

test/transformers/test_mm_int8int2.py

+    triton_output = matmul(ht.view(B * M, N), u.T.contiguous()).view(B, M, -1)
+
+    # Validate packing and unpacking of weights
+    assert (pack_weights(unpack_weights(u.T), 2) == u.T).all(), "Packed weights do not match original weights."


imo we can separate the correctness of pack + unpack to another testing func

Okay i will

ByronHsu · 2024-09-04T17:04:53Z

src/liger_kernel/ops/experimental/mm_int8int2.py

+        BLOCK_SIZE_M: tl.constexpr, BLOCK_SIZE_N: tl.constexpr, BLOCK_SIZE_K: tl.constexpr,  
+        GROUP_SIZE_M: tl.constexpr,
+):
+    # Only triggered when TRITON_DEBUG is set to 1 => ex : TRITON_DEBUG=1 python scritp.py


what is this?

it's a device_assert, it only works when running with TRITON_DEBUG not set to 0, and it ensure that the K is a multiple of BLOCK_SIZE * 4, which is the case of weight matrices, for alignment purposes. In the future we can find a way to make it more general

Does that mean it doesn't error out even if the alignment is incorrect if TRITON_DEBUG not enabled? Wondering if we can use https://triton-lang.org/main/python-api/generated/triton.language.static_assert.html#triton.language.static_assert

or can we just uplift the assertion before the kernel launch (line 158)

Now it's working with static_assert, I just had to specify that K is tl.constexpr

ByronHsu · 2024-09-04T20:51:08Z

please also update the doc: https://github.com/linkedin/Liger-Kernel?tab=readme-ov-file#experimental-kernels

qingquansong

For my own learning purpose, any specific reason we choose bit2 here? I think 1-bit has some cuda support bit counting or matching operations so could be faster if implemented in the context of CUDA, 4bits is safer when doing weight quantization, and is bit 2 a trade-off of accuracy and speed compared bot 4bits and 1bits?

MekkCyber · 2024-09-05T09:34:33Z

@qingquansong Yeah, according to the 1.58 LLM paper, using only -1 and 1 can actually deteriorate performance. Adding a 0 element to select important features or not seems like a much better approach. Plus, using -1, 1, and 0 helps in the context of matmul-free LLMs, where, with the right hardware support, it can significantly boost inference speed and reduce energy consumption, because add operations consume significantly less energy and time than mul operations

ByronHsu

LGTM

ByronHsu · 2024-09-08T01:38:58Z

cc @MekkCyber please fix the style & test and we are good to merge

…er-Kernel into triton_mm_int8int2

ByronHsu · 2024-09-09T17:27:08Z

@MekkCyber ci still failing

MekkCyber · 2024-09-10T10:55:26Z

weird, all tests pass locally, will look into that

ByronHsu · 2024-09-30T17:47:50Z

@MekkCyber can you follow up on this? we can merge it for the next release

…er-Kernel into triton_mm_int8int2

MekkCyber added 3 commits September 2, 2024 23:54

add experimental triton mm int8*int2 kernel

f0729e7

change file name

74a892b

add device_assert

200cde9

ByronHsu reviewed Sep 3, 2024

View reviewed changes

adding test_mm_int8int2

66394df

ByronHsu reviewed Sep 4, 2024

View reviewed changes

replace dynamic_assert with static_assert

cf5c3aa

qingquansong reviewed Sep 5, 2024

View reviewed changes

MekkCyber and others added 3 commits September 5, 2024 17:00

adding comments

96a3412

Merge branch 'main' into triton_mm_int8int2

fe585cd

correct test_mm_int8int2

e344fcb

ByronHsu previously approved these changes Sep 7, 2024

View reviewed changes

Merge branch 'main' into triton_mm_int8int2

1c13431

ByronHsu enabled auto-merge (squash) September 7, 2024 20:06

lancerts and others added 3 commits September 8, 2024 15:43

Merge branch 'main' into triton_mm_int8int2

bf2d4b9

fixing style

ac40e3f

Merge branch 'triton_mm_int8int2' of https://github.com/MekkCyber/Lig…

e91a8a2

…er-Kernel into triton_mm_int8int2

auto-merge was automatically disabled September 9, 2024 09:53
Head branch was pushed to by a user without write access

MekkCyber dismissed ByronHsu’s stale review via e91a8a2 September 9, 2024 09:53

Merge branch 'main' into triton_mm_int8int2

308707b

ByronHsu mentioned this pull request Sep 30, 2024

2024 Q4 Roadmap #285

Open

fixing style

8f2c494

MekkCyber and others added 3 commits October 1, 2024 09:49

Merge branch 'triton_mm_int8int2' of https://github.com/MekkCyber/Lig…

2b2f9fd

…er-Kernel into triton_mm_int8int2

Merge branch 'main' into triton_mm_int8int2

f3d580a

Merge branch 'main' into triton_mm_int8int2

ca8592b

ByronHsu approved these changes Oct 2, 2024

View reviewed changes

Merge branch 'main' into triton_mm_int8int2

fb0f93b

lancerts enabled auto-merge (squash) October 2, 2024 22:04

lancerts merged commit 60640e1 into linkedin:main Oct 2, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT Adding experimental feature : Triton mm int8xint2 #195

FEAT Adding experimental feature : Triton mm int8xint2 #195

MekkCyber commented Sep 3, 2024 •

edited

Loading

ByronHsu left a comment

ByronHsu Sep 3, 2024 •

edited

Loading

MekkCyber Sep 4, 2024

ByronHsu Sep 4, 2024

ByronHsu Sep 3, 2024

ByronHsu Sep 3, 2024

MekkCyber Sep 4, 2024

ByronHsu Sep 4, 2024

ByronHsu commented Sep 4, 2024

ByronHsu Sep 4, 2024

MekkCyber Sep 4, 2024

ByronHsu Sep 4, 2024

MekkCyber Sep 4, 2024

ByronHsu Sep 4, 2024

ByronHsu Sep 4, 2024

MekkCyber Sep 4, 2024

ByronHsu commented Sep 4, 2024

qingquansong left a comment •

edited

Loading

MekkCyber commented Sep 5, 2024

ByronHsu left a comment

ByronHsu commented Sep 8, 2024

ByronHsu commented Sep 9, 2024

MekkCyber commented Sep 10, 2024

ByronHsu commented Sep 30, 2024

FEAT Adding experimental feature : Triton mm int8xint2 #195

FEAT Adding experimental feature : Triton mm int8xint2 #195

Conversation

MekkCyber commented Sep 3, 2024 • edited Loading

Summary

ByronHsu left a comment

Choose a reason for hiding this comment

ByronHsu Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ByronHsu commented Sep 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ByronHsu commented Sep 4, 2024

qingquansong left a comment • edited Loading

Choose a reason for hiding this comment

MekkCyber commented Sep 5, 2024

ByronHsu left a comment

Choose a reason for hiding this comment

ByronHsu commented Sep 8, 2024

ByronHsu commented Sep 9, 2024

MekkCyber commented Sep 10, 2024

ByronHsu commented Sep 30, 2024

MekkCyber commented Sep 3, 2024 •

edited

Loading

ByronHsu Sep 3, 2024 •

edited

Loading

qingquansong left a comment •

edited

Loading