Support in-situ conversion of GPTQ model to marlin format (naive GPTQ kernel also supported). #92

guoqingbao · 2024-10-21T10:21:40Z

For quantized 4-bit GPTQ model:

cargo run --release -- --port 2000 --weight-path /home/mistral_7b-int4/ mistral --quant marlin

It performs in-situ conversion of GPTQ model to marlin format during model loading.

Please note:

Marlin format in-situ conversion only support 4-bit GPTQ (with sym=True, groupsize=128 or -1, desc_act=False).

…to the program.

… model

… kernel also supported).

commit 8059379 Author: Guoqing Bao <topon@outlook.com> Date: Thu Nov 21 18:07:34 2024 +0800 Update batched results commit 4cfe1ff Author: Guoqing Bao <topon@outlook.com> Date: Thu Nov 21 17:54:12 2024 +0800 Batch sampling for argmax strategy under no repetition penalty commit dfce2ea Author: Guoqing Bao <topon@outlook.com> Date: Thu Nov 21 15:35:22 2024 +0800 Optimize token tensor padding & tweak weight path commit 3ebf3b8 Author: Guoqing Bao <topon@outlook.com> Date: Mon Oct 21 18:24:45 2024 +0800 Support in-situ conversion of GPTQ model to marlin format (naive GPTQ kernel also supported). (#92) Support in-situ conversion of GPTQ model to marlin format (naive GPTQ kernel also supported). commit 46b10ad Author: Guoqing Bao <topon@outlook.com> Date: Wed Oct 16 17:48:54 2024 +0800 Add an example for marlin format conversion & update results (#91) Add an example for marlin format conversion & update results

guoqingbao added 27 commits August 13, 2024 14:04

Support in-situ quantization

899515a

Typo fix

6e791f5

Cargo fmt

504398d

Optimize quantized matmul in batch processing & update Q4K results

a3e1fc4

Merge branch 'master' into develop

7309f55

Fix bug for non-stream response

80f56ae

Ask users to provide huggingface token if no token cached and passed …

bd476d3

…to the program.

No crash when both hidden_act and hidden_activation are set for gemma…

afb50f3

… model

Print the number of decoded tokens for each request

616ffc6

Merge branch 'master' into develop

573a61a

Restore previous bug fix

360a227

Support softcapping (Gemma-2 models)

a33884f

Merge branch 'master' into develop

f3b1a7d

Update lib.rs

761067e

Fix Gemma-2 multiple eos/bos ids

ff84499

Custom benchmark with parameters

2c81291

Mention arguments for benchmark.py

221eace

Tweak

08f9491

Support GPTQ/Marlin format quantization (4bit weight, f16 input)

e23d8ae

Merge branch 'master' into develop

d4239ef

Support bf16 inputs for GPTQ/Marlin format quantization

d40c2b0

Merge branch 'develop' into develop

2d5b452

Merge remote-tracking branch 'eric/master' into develop

b7c6cd1

Merge remote-tracking branch master into develop

4b2fe7d

Add an example for marlin format conversion & update results

6078814

Typo fix

08b5e8a

Support in-situ conversion of GPTQ model to marlin format (naive GPTQ…

30a5040

… kernel also supported).

guoqingbao merged commit 3ebf3b8 into master Oct 21, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support in-situ conversion of GPTQ model to marlin format (naive GPTQ kernel also supported). #92

Support in-situ conversion of GPTQ model to marlin format (naive GPTQ kernel also supported). #92

guoqingbao commented Oct 21, 2024

Support in-situ conversion of GPTQ model to marlin format (naive GPTQ kernel also supported). #92

Support in-situ conversion of GPTQ model to marlin format (naive GPTQ kernel also supported). #92

Conversation

guoqingbao commented Oct 21, 2024