Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tools] Add convert tool for Llama models quantized by AutoGPTQ #276

Merged
merged 7 commits into from
Apr 3, 2024

Conversation

xiangzez
Copy link
Contributor

No description provided.

@Duyi-Wang
Copy link
Contributor

It's preferable to an additional param for quantification rather than creating a new converter, as this approach allows us to integrate new quantification methods in the future, like AWQ

@Duyi-Wang
Copy link
Contributor

And update the description of Converter in README.

@xiangzez xiangzez changed the title Add convert tool for Llama models quantized by AutoGPTQ [Tools] Add convert tool for Llama models quantized by AutoGPTQ Mar 28, 2024
@Duyi-Wang
Copy link
Contributor

How about quantization:Optional[str]= "gpqt"? use_gptq is inconvenient to add new types in future

@xiangzez
Copy link
Contributor Author

For documentation, I think we need a separate tutorial page for quantization. @miaojinc already wrote a doc and we should update this doc and put it in main branch.

@miaojinc
Copy link
Contributor

For documentation, I think we need a separate tutorial page for quantization. @miaojinc already wrote a doc and we should update this doc and put it in main branch.

Sure, I will update the doc after this PR merged.

@Duyi-Wang
Copy link
Contributor

For documentation, I think we need a separate tutorial page for quantization. @miaojinc already wrote a doc and we should update this doc and put it in main branch.

Sure, I will update the doc after this PR merged.

@miaojinc Could you add it to our docs, source code is under docs branch.

@miaojinc
Copy link
Contributor

For documentation, I think we need a separate tutorial page for quantization. @miaojinc already wrote a doc and we should update this doc and put it in main branch.

Sure, I will update the doc after this PR merged.

@miaojinc Could you add it to our docs, source code is under docs branch.

Yes, sure. I will do that in a new pull request for quantization document.

if self.wbits == 8:
qweight = qweight - 128 # uint8 to int8
else:
# pack 2 uint4 to 1 int8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint4x2

config["llama"]["end_id"] = str(hf_config["eos_token_id"])
config["llama"]["weight_data_type"] = dtype

config["llama"]["quant_decoder_weights"] = str(True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quant_decoder_weights -> quant_weight_data_type
int8,uint8,int4,uint4,fp8,fp4

@@ -53,16 +53,22 @@ def get_weight_data_type(self, dtype: str):
else:
raise Exception(f"{self.__class__.__name__} don't support convert weight to {dtype}.")

def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8):
def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8, quantization=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# from_quantized_model is from HF int8 model to xFT int8 model.
def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8, from_quantized_model=None):

"""

if quantization != "gptq":
print(f"[ERROR] Quantization method {quantization} is not supported.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ERROR] Please input model must be {quantization} quantized model.

if not os.path.exists(output_dir):
os.makedirs(output_dir)

# load the model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# load the quantized model, do not do GPTQ quantization.

elif "scales" in name:
model_named_parameters[name] = param.float()
elif "qzeros" in name:
qzeros = param
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zeros is uint8/uint4 from QPTQ quantization, and need to convert to fp32 for xFT.

elif "lm_head" in name:
model_named_parameters[name] = param
elif "scales" in name:
model_named_parameters[name] = param.float()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scales is fp16/fp32/bf16 from QPTQ quantization, and need to convert to fp32 for xFT.

zeros = - scales * qzeros
model_named_parameters[name] = zeros
elif "qweight" in name:
# qweight is not transposed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert uint8 qweight from QPTQ quantization, and need to convert to int8 for xFT.
convert uint4 qweight from QPTQ quantization, and need to convert to uint4x2 for xFT.

@changqi1
Copy link
Contributor

changqi1 commented Apr 2, 2024

@xiangzez CI failed on baichuan model?

@xiangzez
Copy link
Contributor Author

xiangzez commented Apr 2, 2024

@changqi1 CI issue should be fixed in #287

@Duyi-Wang
Copy link
Contributor

@changqi1 CI issue should be fixed in #287

rebase or merge main branch?

@changqi1
Copy link
Contributor

changqi1 commented Apr 2, 2024

We could rebase to check this PR's status.

@changqi1 changqi1 merged commit f74b108 into intel:main Apr 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants