Improve WOQ model saving and loading #1375

changwangss · 2024-03-13T10:29:17Z

Type of Change

Code Changes

Change all algo name, RTN to Rtn, AWQ to Awq, TEQ to Teq, GPTQ no change, AUTOROUND to AutoRound
For rtn/awq/teq algo, backend get scales and zeros from INC, not computed by backend.
Saving: checkpoint weightformat, config.json and quantize_config.json to Transformers_like API request.

python run_generation.py --model facebook/opt-125m --woq --bits 4 --output_dir "saved_results"

Loading: Transformers_like

python run_generation.py --model "saved_results"

Local validation

Rtn

# int4_clip
python run_generation.py --model facebook/opt-125m --woq --bits 4
python run_generation.py --model facebook/opt-125m --woq --bits 4  --scheme "asym"
# fp8_e5m2
python run_generation.py --model facebook/opt-125m --woq --bits 8 --weight_dtype "fp8_e5m2"

Awq

python run_generation.py --model facebook/opt-125m --woq --woq_algo "Awq" --bits 4 --scheme "asym"

Teq

python run_generation.py --model facebook/opt-125m --woq --woq_algo "Teq" --bits 4 --scheme "asym"

GPTQ

python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" --bits 4 --scheme "asym" --desc_act --blocksize 32 --nsamples 32 --max_input_length 256 --static_groups

AutoRound

python run_generation.py --model facebook/opt-125m --woq --woq_algo "AutoRound" --bits 4 --scheme "asym" --calib_len 1024 --nsamples 128

Description

detail description
JIRA ticket: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: changwangss <chang1.wang@intel.com>

github-actions · 2024-03-13T10:30:15Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow

Check ID	Status
format-scan (pylint)	success	✅
format-scan (bandit)	success	✅
format-scan (clangformat)	success	✅
format-scan (cloc)	success	✅
format-scan (cpplint)	success	✅

These checks are required after the changes to intel_extension_for_transformers/llm/operator/csrc/dispatcher/include/bestla_packq_impl.hpp, intel_extension_for_transformers/llm/operator/csrc/dispatcher/src/bestla_packq_impl.cpp, intel_extension_for_transformers/llm/operator/csrc/qbits_ut/test_packq.py, intel_extension_for_transformers/llm/quantization/gptq_utils.py, intel_extension_for_transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Optimize Unit Test workflow

Check ID	Status
optimize-unit-test-baseline	success	✅
optimize-unit-test-PR-test	success	✅
Genreate-OptimizeUT-Report	success	✅

These checks are required after the changes to intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py, intel_extension_for_transformers/llm/quantization/gptq_utils.py, intel_extension_for_transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/llm/quantization/utils.py, intel_extension_for_transformers/llm/operator/csrc/dispatcher/include/bestla_packq_impl.hpp, intel_extension_for_transformers/llm/operator/csrc/dispatcher/src/bestla_packq_impl.cpp, intel_extension_for_transformers/llm/operator/csrc/qbits_ut/test_packq.py, tests/CI/test_quantization.py, tests/CI/test_weight_only.py.

🟢 NeuralChat Unit Test

Check ID	Status
neuralchat-unit-test-baseline	success	✅
neuralchat-unit-test-PR-test	success	✅
Generate-NeuralChat-Report	success	✅

These checks are required after the changes to intel_extension_for_transformers/llm/quantization/gptq_utils.py, intel_extension_for_transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Chat Bot Test workflow

Check ID	Status	Error details
call-inference-llama-2-7b-chat-hf / inference test	success		✅
call-inference-mpt-7b-chat / inference test	success		✅

These checks are required after the changes to intel_extension_for_transformers/llm/quantization/gptq_utils.py, intel_extension_for_transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 360 minutes every 180 seconds. If you have any other questions, contact VincyZhang or XuehaoSun for help.

intel_extension_for_transformers/transformers/utils/config.py

intel_extension_for_transformers/transformers/modeling/modeling_auto.py

a32543254

LGTM

a32543254 · 2024-03-14T01:21:21Z

After this pr, we can load a HF gptq/ awq model for both CPU and GPU to do inference?

Signed-off-by: changwangss <chang1.wang@intel.com>

Signed-off-by: Wang, Chang <chang1.wang@intel.com>

Signed-off-by: changwangss <chang1.wang@intel.com>

changwangss · 2024-03-18T14:10:53Z

nf4, fp4, int4_fullrange is waiting kernel ready.
int8 issue will be fixed later.
docs and readme will be updated by next PR.

zhewang1-intc and others added 2 commits March 12, 2024 22:54

get scale&zp tensors in packw

5cd79b8

export INC model

e87c668

Signed-off-by: changwangss <chang1.wang@intel.com>

changwangss requested review from PenghuiCheng and a32543254 as code owners March 13, 2024 10:29

changwangss added the WIP label Mar 13, 2024

a32543254 reviewed Mar 14, 2024

View reviewed changes

intel_extension_for_transformers/transformers/utils/config.py Outdated Show resolved Hide resolved

a32543254 reviewed Mar 14, 2024

View reviewed changes

intel_extension_for_transformers/transformers/modeling/modeling_auto.py Show resolved Hide resolved

a32543254 approved these changes Mar 14, 2024

View reviewed changes

a32543254 closed this Mar 14, 2024

a32543254 reopened this Mar 14, 2024

changwangss added 4 commits March 14, 2024 03:11

save to optimum format

69a6a58

Signed-off-by: changwangss <chang1.wang@intel.com>

Merge branch 'main' into recover_qparms

a9db342

support sym and rtn, awq, teq save and loading

19b2d6c

Signed-off-by: changwangss <chang1.wang@intel.com>

support asym

26707fd

Signed-off-by: changwangss <chang1.wang@intel.com>

VincyZhang added the ITREX1.4 label Mar 18, 2024

changwangss added 7 commits March 17, 2024 19:46

fix recover int weight and improve awq

490347b

Signed-off-by: changwangss <chang1.wang@intel.com>

fix ut

9b4efa2

Signed-off-by: changwangss <chang1.wang@intel.com>

Merge branch 'main' into recover_qparms

1802844

Signed-off-by: Wang, Chang <chang1.wang@intel.com>

fix int8

750a2f5

Signed-off-by: changwangss <chang1.wang@intel.com>

unsupport int4fullrange

fd8c352

Signed-off-by: changwangss <chang1.wang@intel.com>

recover ut

cd10092

Signed-off-by: changwangss <chang1.wang@intel.com>

fix ut awq accuracy

d2abfe7

Signed-off-by: changwangss <chang1.wang@intel.com>

VincyZhang merged commit 30d9d10 into main Mar 18, 2024
16 checks passed

VincyZhang deleted the recover_qparms branch March 18, 2024 14:15

changwangss mentioned this pull request Mar 19, 2024

Fix WOQ int8 unpack weight #1393

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve WOQ model saving and loading #1375

Improve WOQ model saving and loading #1375

changwangss commented Mar 13, 2024 •

edited

Loading

github-actions bot commented Mar 13, 2024 •

edited

Loading

a32543254 left a comment

a32543254 commented Mar 14, 2024 •

edited

Loading

changwangss commented Mar 18, 2024

Improve WOQ model saving and loading #1375

Improve WOQ model saving and loading #1375

Conversation

changwangss commented Mar 13, 2024 • edited Loading

Type of Change

Code Changes

Local validation

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

github-actions bot commented Mar 13, 2024 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

a32543254 left a comment

Choose a reason for hiding this comment

a32543254 commented Mar 14, 2024 • edited Loading

changwangss commented Mar 18, 2024

changwangss commented Mar 13, 2024 •

edited

Loading

github-actions bot commented Mar 13, 2024 •

edited

Loading

a32543254 commented Mar 14, 2024 •

edited

Loading