[WIP] Refactor & Add Quant Integration #10

shadowpa0327 · 2024-11-08T19:33:45Z

[WIP] What this PR do?

Move kernels & customized attention module implementation from kernels/ into palu

…s for KV-Cache managements

…n during prefilling

shadowpa0327 added 19 commits November 7, 2024 16:43

refactor module structure

70a77d2

add palu's backend (customized kernel wrapper), and quantization util…

a26414f

…s for KV-Cache managements

[NFC] code cleanup

cd46da9

add cuda kernel impl for quantization integration

64d5e03

remove duplicated code

55269a3

code cleanup

e72bf05

refactor directory structure. Move csrc into palu/

2d2fa08

add test for recompute & quant kernels

a00462a

refactor palu_attention_module & add test

c82eb05

support arbitary seq_len of recompute kernel

49944d3

fix bug in kernels

f24ddef

remove redundant print

050b321

fix typo

addd454

fix bug when their is no residual

0a00a1c

code cleanup and fix typo

9bf2f77

add testcase in bgemm, for testing correctness for matrix shape happe…

06e1d41

…n during prefilling

add test for palu_attention with value 4-bit quant

83501b2

code cleanup

31c70c9

update test

4cfca28