QBits adapt to the latest BesTLA #1535

zhewang1-intc · 2024-05-09T01:28:19Z

Type of Change

feature or bug fix or documentation or others: feature
API changed or not: yes

Description

support int2/int3 weight-dtype
deprecate int4_fullrange weight-dtype
update int2/int3/int4 definition
support asym with int8-cmpt & bf16-scale-dtype
support optimized thread-pool for hybrid arch client CPUs
support users to set flexible threads num when they use QBits do inference
support int8 compt with AVX2 ISA.

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR
as above

How has this PR been tested?

tested on Xeon 8480+ & Intel Core 10900

Dependency Change?

any library dependency introduced or removed: upgrade BesTLA

github-actions · 2024-05-09T01:28:45Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 Format Scan Tests workflow

Check ID	Status
format-scan (pylint)	success	✅
format-scan (bandit)	success	✅
format-scan (cloc)	success	✅
format-scan (cpplint)	success	✅

These checks are required after the changes to intel_extension_for_transformers/qbits/CMakeLists.txt, intel_extension_for_transformers/qbits/dispatcher/CMakeLists.txt, intel_extension_for_transformers/qbits/dispatcher/include/bestla_customop.hpp, intel_extension_for_transformers/qbits/dispatcher/include/bestla_weightonly_dispatcher.hpp, intel_extension_for_transformers/qbits/dispatcher/include/dispatcher_utils.hpp, intel_extension_for_transformers/qbits/dispatcher/neural_speed.cmake, intel_extension_for_transformers/qbits/dispatcher/src/bestla_gemm_dispatcher.cpp, intel_extension_for_transformers/qbits/dispatcher/src/bestla_packq_impl.cpp, intel_extension_for_transformers/qbits/dispatcher/src/bestla_weightonly_dispatcher.cpp, intel_extension_for_transformers/qbits/qbits.cpp, intel_extension_for_transformers/qbits/qbits_ut/test_packq.py, intel_extension_for_transformers/qbits/qbits_ut/test_weightonly.py, intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Optimize Unit Test workflow

Check ID	Status
optimize-unit-test-baseline	success	✅
optimize-unit-test-PR-test	success	✅
Genreate-OptimizeUT-Report	success	✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py, tests/CI/test_quantization.py, tests/CI/test_weight_only.py.

🟢 NeuralChat Unit Test

Check ID	Status
neuralchat-unit-test-baseline	success	✅
neuralchat-unit-test-PR-test	success	✅
Generate-NeuralChat-Report	success	✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Engine Unit Test workflow

Check ID	Status
engine-unit-test-baseline	success	✅
engine-unit-test-PR-test	success	✅
Genreate-Engine-Report	success	✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py.

🟢 Chat Bot Test workflow

Check ID	Status	Error details
call-inference-llama-2-7b-chat-hf / inference test	success		✅
call-inference-mpt-7b-chat / inference test	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/nn/modules.py, intel_extension_for_transformers/transformers/utils/config.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

for more information, see https://pre-commit.ci

Signed-off-by: changwangss <chang1.wang@intel.com>

changwangss · 2024-05-09T06:44:18Z

there is the accuracy gap for asym.
v1.4.1 release + #1537

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --accuracy  --batch_size 56 --scheme asym
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|29.0846|±  |1.1120|
|              |       |none  |     0|acc       | 0.3677|±  |0.0067|

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --accuracy  --batch_size 56 --scheme sym
hf (pretrained=./saved_results,trust_remote_code=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 56
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|41.0024|±  |1.5290|
|              |       |none  |     0|acc       | 0.3150|±  |0.0065|

Accuracy for lambada_openai is: 0.3149621579662333

this PR by commit bf511a3 + #1537

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --accuracy  --batch_size 56 --scheme asym
hf (pretrained=./saved_results,trust_remote_code=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 56
|    Tasks     |Version|Filter|n-shot|  Metric  | Value  |   |Stderr|
|--------------|------:|------|-----:|----------|-------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|166.3867|±  |8.3104|
|              |       |none  |     0|acc       |  0.2206|±  |0.0058|

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --benchmark  --batch_size 1 --scheme asym

['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places, and meet new people, and meet new people, and meet people, and meet people, and meet people, and meet people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places, and meet new people, and meet new people, and meet people, and meet people, and meet people, and meet people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places, and meet new people, and meet new people, and meet people, and meet people, and meet people, and meet people']

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --accuracy  --batch_size 56 --scheme sym
hf (pretrained=./saved_results,trust_remote_code=False), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 56
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|41.0024|±  |1.5290|
|              |       |none  |     0|acc       | 0.3150|±  |0.0065|

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --bits 4 --benchmark --batch_size 1 --scheme sym
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']
['Once upon a time, there existed a little girl, who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to go to places and meet new people. She wanted to go to places and meet new people. She wanted to go to places and meet new people']

Signed-off-by: Wang, Chang <chang1.wang@intel.com>

Signed-off-by: changwangss <chang1.wang@intel.com>

changwangss · 2024-05-11T07:51:34Z

with commit id bff7973

python run_generation_cpu_woq.py --model facebook/opt-125m --woq --batch_size 56 --accuracy --scheme asym --bits 4
|    Tasks     |Version|Filter|n-shot|  Metric  | Value |   |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|lambada_openai|      1|none  |     0|perplexity|29.0845|±  |1.1120|
|              |       |none  |     0|acc       | 0.3677|±  |0.0067|

changwangss · 2024-05-11T09:37:40Z

format scan depends on #1539 fix.

zhewang1-intc added 2 commits May 8, 2024 20:39

update

9a26a64

doc

8858fb1

zhewang1-intc requested a review from VincyZhang as a code owner May 9, 2024 01:28

pre-commit-ci bot and others added 4 commits May 9, 2024 01:29

[pre-commit.ci] auto fixes from pre-commit.com hooks

7f96c63

for more information, see https://pre-commit.ci

upgrade gemmcore

acbaff8

fix cmpt-dtype enum

a0cac9f

adapt python code

bf511a3

Signed-off-by: changwangss <chang1.wang@intel.com>

zhewang1-intc requested a review from PenghuiCheng as a code owner May 9, 2024 06:32

support avx2 int8 cmpt

a433f88

zhewang1-intc added 2 commits May 9, 2024 00:00

fix compile, todo: remove bad code smell

3cfca1e

eliminate bad smell code

400b9eb

zhewang1-intc force-pushed the update_qbits branch from 9c0e519 to 400b9eb Compare May 9, 2024 07:49

zhewang1-intc and others added 5 commits May 9, 2024 16:04

fix compile

c8b1a41

Merge branch 'main' into update_qbits

78bd750

Merge branch 'main' into update_qbits

b3f41e6

fix asym accuracy issue

6e8d106

Signed-off-by: Wang, Chang <chang1.wang@intel.com>

fix asym accuracy issue

bff7973

Signed-off-by: changwangss <chang1.wang@intel.com>

VincyZhang and others added 2 commits May 12, 2024 19:50

Merge branch 'main' into update_qbits

dbd3020

Merge branch 'main' into update_qbits

3ed0fcb

VincyZhang approved these changes May 13, 2024

View reviewed changes

VincyZhang merged commit c169bec into main May 13, 2024
22 checks passed

VincyZhang deleted the update_qbits branch May 13, 2024 06:00

changwangss mentioned this pull request May 14, 2024

Support ipex cpu WOQ backend #1546

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QBits adapt to the latest BesTLA #1535

QBits adapt to the latest BesTLA #1535

zhewang1-intc commented May 9, 2024 •

edited

Loading

github-actions bot commented May 9, 2024 •

edited

Loading

changwangss commented May 9, 2024 •

edited

Loading

changwangss commented May 11, 2024 •

edited

Loading

changwangss commented May 11, 2024

QBits adapt to the latest BesTLA #1535

QBits adapt to the latest BesTLA #1535

Conversation

zhewang1-intc commented May 9, 2024 • edited Loading

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

github-actions bot commented May 9, 2024 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

changwangss commented May 9, 2024 • edited Loading

changwangss commented May 11, 2024 • edited Loading

changwangss commented May 11, 2024

zhewang1-intc commented May 9, 2024 •

edited

Loading

github-actions bot commented May 9, 2024 •

edited

Loading

changwangss commented May 9, 2024 •

edited

Loading

changwangss commented May 11, 2024 •

edited

Loading