Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

QBits adapt to the latest BesTLA #1535

Merged
merged 16 commits into from
May 13, 2024
Merged

QBits adapt to the latest BesTLA #1535

merged 16 commits into from
May 13, 2024

Conversation

zhewang1-intc
Copy link
Contributor

@zhewang1-intc zhewang1-intc commented May 9, 2024

Type of Change

feature or bug fix or documentation or others: feature
API changed or not: yes

Description

  1. support int2/int3 weight-dtype
  2. deprecate int4_fullrange weight-dtype
  3. update int2/int3/int4 definition
  4. support asym with int8-cmpt & bf16-scale-dtype
  5. support optimized thread-pool for hybrid arch client CPUs
  6. support users to set flexible threads num when they use QBits do inference
  7. support int8 compt with AVX2 ISA.

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR
as above

How has this PR been tested?

tested on Xeon 8480+ & Intel Core 10900

Dependency Change?

any library dependency introduced or removed: upgrade BesTLA

@zhewang1-intc zhewang1-intc requested a review from VincyZhang as a code owner May 9, 2024 01:28
Copy link

github-actions bot commented May 9, 2024

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow
Check ID Status Error details
format-scan (pylint) success
format-scan (bandit) success
format-scan (cloc) success
format-scan (cpplint) success

These checks are required after the changes to intel_extension_for_transformers/qbits/CMakeLists.txt, intel_extension_for_transformers/qbits/dispatcher/CMakeLists.txt, intel_extension_for_transformers/qbits/dispatcher/include/bestla_customop.hpp, intel_extension_for_transformers/qbits/dispatcher/include/bestla_weightonly_dispatcher.hpp, intel_extension_for_transformers/qbits/dispatcher/include/dispatcher_utils.hpp, intel_extension_for_transformers/qbits/dispatcher/neural_speed.cmake, intel_extension_for_transformers/qbits/dispatcher/src/bestla_gemm_dispatcher.cpp, intel_extension_for_transformers/qbits/dispatcher/src/bestla_packq_impl.cpp, intel_extension_for_transformers/qbits/dispatcher/src/bestla_weightonly_dispatcher.cpp, intel_extension_for_transformers/qbits/qbits.cpp, intel_extension_for_transformers/qbits/qbits_ut/test_packq.py, intel_extension_for_transformers/qbits/qbits_ut/test_weightonly.py, intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Optimize Unit Test workflow
Check ID Status Error details
optimize-unit-test-baseline success
optimize-unit-test-PR-test success
Genreate-OptimizeUT-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py, tests/CI/test_quantization.py, tests/CI/test_weight_only.py.

🟢 NeuralChat Unit Test
Check ID Status Error details
neuralchat-unit-test-baseline success
neuralchat-unit-test-PR-test success
Generate-NeuralChat-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Engine Unit Test workflow
Check ID Status Error details
engine-unit-test-baseline success
engine-unit-test-PR-test success
Genreate-Engine-Report success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Chat Bot Test workflow
Check ID Status Error details
call-inference-llama-2-7b-chat-hf / inference test success
call-inference-mpt-7b-chat / inference test success

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

@changwangss
Copy link
Contributor

changwangss commented May 9, 2024

there is the accuracy gap for asym.
v1.4.1 release + #1537

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --accuracy  --batch_size 56 --scheme asym
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|29.0846|±  |1.1120|
|              |       |none  |     0|acc       | 0.3677|±  |0.0067|
python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --accuracy  --batch_size 56 --scheme sym
hf (pretrained=./saved_results,trust_remote_code=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 56
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|41.0024|±  |1.5290|
|              |       |none  |     0|acc       | 0.3150|±  |0.0065|

Accuracy for lambada_openai is: 0.3149621579662333

this PR by commit bf511a3 + #1537

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --accuracy  --batch_size 56 --scheme asym
hf (pretrained=./saved_results,trust_remote_code=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 56
|    Tasks     |Version|Filter|n-shot|  Metric  | Value  |   |Stderr|
|--------------|------:|------|-----:|----------|-------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|166.3867|±  |8.3104|
|              |       |none  |     0|acc       |  0.2206|±  |0.0058|
python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --benchmark  --batch_size 1 --scheme asym

['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places, and meet new people, and meet new people, and meet people, and meet people, and meet people, and meet people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places, and meet new people, and meet new people, and meet people, and meet people, and meet people, and meet people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places, and meet new people, and meet new people, and meet people, and meet people, and meet people, and meet people']
python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --accuracy  --batch_size 56 --scheme sym
hf (pretrained=./saved_results,trust_remote_code=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 56
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|41.0024|±  |1.5290|
|              |       |none  |     0|acc       | 0.3150|±  |0.0065|
python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --benchmark --batch_size 1 --scheme sym
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']

zhewang1-intc and others added 5 commits May 9, 2024 16:04
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
@changwangss
Copy link
Contributor

changwangss commented May 11, 2024

with commit id bff7973

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --batch_size 56 --accuracy --scheme asym --bits 4
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|29.0845|±  |1.1120|
|              |       |none  |     0|acc       | 0.3677|±  |0.0067|

@changwangss
Copy link
Contributor

format scan depends on #1539 fix.

@VincyZhang VincyZhang merged commit c169bec into main May 13, 2024
22 checks passed
@VincyZhang VincyZhang deleted the update_qbits branch May 13, 2024 06:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants