Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Improve WOQ model saving and loading #1375

Merged
merged 13 commits into from
Mar 18, 2024
Merged

Improve WOQ model saving and loading #1375

merged 13 commits into from
Mar 18, 2024

Conversation

changwangss
Copy link
Contributor

@changwangss changwangss commented Mar 13, 2024

Type of Change

Code Changes

  1. Change all algo name, RTN to Rtn, AWQ to Awq, TEQ to Teq, GPTQ no change, AUTOROUND to AutoRound
  2. For rtn/awq/teq algo, backend get scales and zeros from INC, not computed by backend.
  3. Saving: checkpoint weightformat, config.json and quantize_config.json to Transformers_like API request.
python run_generation.py --model facebook/opt-125m --woq --bits 4 --output_dir "saved_results"
  1. Loading: Transformers_like
python run_generation.py --model "saved_results"

Local validation

Rtn

# int4_clip
python run_generation.py --model facebook/opt-125m --woq --bits 4
python run_generation.py --model facebook/opt-125m --woq --bits 4  --scheme "asym"
# fp8_e5m2
python run_generation.py --model facebook/opt-125m --woq --bits 8 --weight_dtype "fp8_e5m2"

Awq

python run_generation.py --model facebook/opt-125m --woq --woq_algo "Awq" --bits 4 --scheme "asym"

Teq

python run_generation.py --model facebook/opt-125m --woq --woq_algo "Teq" --bits 4 --scheme "asym"

GPTQ

python run_generation.py --model facebook/opt-125m --woq --woq_algo "GPTQ" --bits 4 --scheme "asym" --desc_act --blocksize 32 --nsamples 32 --max_input_length 256 --static_groups

AutoRound

python run_generation.py --model facebook/opt-125m --woq --woq_algo "AutoRound" --bits 4 --scheme "asym" --calib_len 1024 --nsamples 128

Description

detail description
JIRA ticket: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

zhewang1-intc and others added 2 commits March 12, 2024 22:54
Signed-off-by: changwangss <chang1.wang@intel.com>
Copy link

github-actions bot commented Mar 13, 2024

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) success
format-scan (bandit) success
format-scan (clangformat) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/llm/operator/csrc/dispatcher/include/bestla_packq_impl.hpp, intel_extension_for_transformers/llm/operator/csrc/dispatcher/src/bestla_packq_impl.cpp, intel_extension_for_transformers/llm/operator/csrc/qbits_ut/test_packq.py, intel_extension_for_transformers/llm/quantization/gptq_utils.py, intel_extension_for_transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test success
Genreate-OptimizeUT-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py, intel_extension_for_transformers/llm/quantization/gptq_utils.py, intel_extension_for_transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/llm/quantization/utils.py, intel_extension_for_transformers/llm/operator/csrc/dispatcher/include/bestla_packq_impl.hpp, intel_extension_for_transformers/llm/operator/csrc/dispatcher/src/bestla_packq_impl.cpp, intel_extension_for_transformers/llm/operator/csrc/qbits_ut/test_packq.py, tests/CI/test_quantization.py, tests/CI/test_weight_only.py.

🟢 NeuralChat Unit Test
Check ID Status Error details
neuralchat-unit-test-baseline success
neuralchat-unit-test-PR-test success
Generate-NeuralChat-Report success

These checks are required after the changes to intel_extension_for_transformers/llm/quantization/gptq_utils.py, intel_extension_for_transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Chat Bot Test workflow
Check ID Status Error details
call-inference-llama-2-7b-chat-hf / inference test success
call-inference-mpt-7b-chat / inference test success

These checks are required after the changes to intel_extension_for_transformers/llm/quantization/gptq_utils.py, intel_extension_for_transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/llm/quantization/utils.py, intel_extension_for_transformers/transformers/modeling/modeling_auto.py, intel_extension_for_transformers/transformers/utils/config.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 360 minutes every 180 seconds. If you have any other questions, contact VincyZhang or XuehaoSun for help.

Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@a32543254 a32543254 closed this Mar 14, 2024
@a32543254 a32543254 reopened this Mar 14, 2024
@a32543254
Copy link
Contributor

a32543254 commented Mar 14, 2024

After this pr, we can load a HF gptq/ awq model for both CPU and GPU to do inference?

Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
@changwangss
Copy link
Contributor Author

nf4, fp4, int4_fullrange is waiting kernel ready.
int8 issue will be fixed later.
docs and readme will be updated by next PR.

@VincyZhang VincyZhang merged commit 30d9d10 into main Mar 18, 2024
16 checks passed
@VincyZhang VincyZhang deleted the recover_qparms branch March 18, 2024 14:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants