[Tools] Add convert tool for Llama models quantized by AutoGPTQ #276

xiangzez · 2024-03-25T08:04:00Z

No description provided.

Duyi-Wang · 2024-03-28T02:27:51Z

It's preferable to an additional param for quantification rather than creating a new converter, as this approach allows us to integrate new quantification methods in the future, like AWQ

Duyi-Wang · 2024-03-28T02:33:22Z

And update the description of Converter in README.

Duyi-Wang · 2024-03-28T07:15:50Z

How about quantization:Optional[str]= "gpqt"? use_gptq is inconvenient to add new types in future

xiangzez · 2024-03-28T09:24:33Z

For documentation, I think we need a separate tutorial page for quantization. @miaojinc already wrote a doc and we should update this doc and put it in main branch.

miaojinc · 2024-03-29T02:27:35Z

For documentation, I think we need a separate tutorial page for quantization. @miaojinc already wrote a doc and we should update this doc and put it in main branch.

Sure, I will update the doc after this PR merged.

Duyi-Wang · 2024-03-29T02:37:50Z

For documentation, I think we need a separate tutorial page for quantization. @miaojinc already wrote a doc and we should update this doc and put it in main branch.

Sure, I will update the doc after this PR merged.

@miaojinc Could you add it to our docs, source code is under docs branch.

miaojinc · 2024-03-29T02:41:05Z

For documentation, I think we need a separate tutorial page for quantization. @miaojinc already wrote a doc and we should update this doc and put it in main branch.

Sure, I will update the doc after this PR merged.

@miaojinc Could you add it to our docs, source code is under docs branch.

Yes, sure. I will do that in a new pull request for quantization document.

changqi1 · 2024-03-29T07:16:48Z

src/xfastertransformer/tools/llama_convert.py

+                if self.wbits == 8:
+                    qweight = qweight - 128 # uint8 to int8
+                else:
+                    # pack 2 uint4 to 1 int8


changqi1 · 2024-03-29T07:17:50Z

src/xfastertransformer/tools/llama_convert.py

+            config["llama"]["end_id"] = str(hf_config["eos_token_id"])
+            config["llama"]["weight_data_type"] = dtype
+
+            config["llama"]["quant_decoder_weights"] = str(True)


quant_decoder_weights -> quant_weight_data_type
int8,uint8,int4,uint4,fp8,fp4

changqi1 · 2024-03-29T07:26:52Z

src/xfastertransformer/tools/convert.py

@@ -53,16 +53,22 @@ def get_weight_data_type(self, dtype: str):
        else:
            raise Exception(f"{self.__class__.__name__} don't support convert weight to {dtype}.")

-    def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8):
+    def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8, quantization=None):


# from_quantized_model is from HF int8 model to xFT int8 model.
def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8, from_quantized_model=None):

changqi1 · 2024-03-29T07:31:10Z

src/xfastertransformer/tools/llama_convert.py

+        """
+
+        if quantization != "gptq":
+            print(f"[ERROR] Quantization method {quantization} is not supported.")


[ERROR] Please input model must be {quantization} quantized model.

changqi1 · 2024-03-29T07:32:15Z

src/xfastertransformer/tools/llama_convert.py

+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+
+        # load the model


# load the quantized model, do not do GPTQ quantization.

changqi1 · 2024-03-29T07:35:52Z

src/xfastertransformer/tools/llama_convert.py

+            elif "scales" in name:
+                model_named_parameters[name] = param.float()
+            elif "qzeros" in name:
+                qzeros = param


zeros is uint8/uint4 from QPTQ quantization, and need to convert to fp32 for xFT.

changqi1 · 2024-03-29T07:36:32Z

src/xfastertransformer/tools/llama_convert.py

+            elif "lm_head" in name:
+                model_named_parameters[name] = param
+            elif "scales" in name:
+                model_named_parameters[name] = param.float()


scales is fp16/fp32/bf16 from QPTQ quantization, and need to convert to fp32 for xFT.

changqi1 · 2024-03-29T07:38:53Z

src/xfastertransformer/tools/llama_convert.py

+                zeros = - scales * qzeros
+                model_named_parameters[name] = zeros
+            elif "qweight" in name:
+                # qweight is not transposed


convert uint8 qweight from QPTQ quantization, and need to convert to int8 for xFT.
convert uint4 qweight from QPTQ quantization, and need to convert to uint4x2 for xFT.

changqi1 · 2024-04-02T07:46:25Z

@xiangzez CI failed on baichuan model?

xiangzez · 2024-04-02T07:50:59Z

@changqi1 CI issue should be fixed in #287

Duyi-Wang · 2024-04-02T07:55:07Z

@changqi1 CI issue should be fixed in #287

rebase or merge main branch？

changqi1 · 2024-04-02T07:57:49Z

We could rebase to check this PR's status.

Add convert tool for Llama models quantized by AutoGPTQ

c2a430b

Merge LlamaGPTQConvert into LlamaConvert class

d2e0592

xiangzez changed the title ~~Add convert tool for Llama models quantized by AutoGPTQ~~ [Tools] Add convert tool for Llama models quantized by AutoGPTQ Mar 28, 2024

change parameter name

eedd8f7

changqi1 reviewed Mar 29, 2024

View reviewed changes

xiangzez added 2 commits April 2, 2024 12:56

address review

9936e2e

Minor fix

0e083a7

changqi1 approved these changes Apr 2, 2024

View reviewed changes

xiangzez added 2 commits April 2, 2024 15:58

Merge branch 'main' into llama_gptq_convert

be97b40

Merge remote-tracking branch 'origin/main' into llama_gptq_convert

859c349

changqi1 merged commit f74b108 into intel:main Apr 3, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tools] Add convert tool for Llama models quantized by AutoGPTQ #276

[Tools] Add convert tool for Llama models quantized by AutoGPTQ #276

xiangzez commented Mar 25, 2024

Duyi-Wang commented Mar 28, 2024

Duyi-Wang commented Mar 28, 2024

Duyi-Wang commented Mar 28, 2024

xiangzez commented Mar 28, 2024

miaojinc commented Mar 29, 2024

Duyi-Wang commented Mar 29, 2024

miaojinc commented Mar 29, 2024

changqi1 Mar 29, 2024

changqi1 Mar 29, 2024

changqi1 Mar 29, 2024

changqi1 Mar 29, 2024

changqi1 Mar 29, 2024

changqi1 Mar 29, 2024

changqi1 Mar 29, 2024

changqi1 Mar 29, 2024

changqi1 commented Apr 2, 2024

xiangzez commented Apr 2, 2024

Duyi-Wang commented Apr 2, 2024

changqi1 commented Apr 2, 2024

[Tools] Add convert tool for Llama models quantized by AutoGPTQ #276

[Tools] Add convert tool for Llama models quantized by AutoGPTQ #276

Conversation

xiangzez commented Mar 25, 2024

Duyi-Wang commented Mar 28, 2024

Duyi-Wang commented Mar 28, 2024

Duyi-Wang commented Mar 28, 2024

xiangzez commented Mar 28, 2024

miaojinc commented Mar 29, 2024

Duyi-Wang commented Mar 29, 2024

miaojinc commented Mar 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

changqi1 commented Apr 2, 2024

xiangzez commented Apr 2, 2024

Duyi-Wang commented Apr 2, 2024

changqi1 commented Apr 2, 2024