support Avx2 #493

yuchengliu1 · 2023-10-18T09:02:32Z

Type of Change

feature
No API changed

Run LLM on client CPU with AVX2 (without avx512)

detail description
JIRA ticket: xxx

Expected Behavior & Potential Risk

CPU: i7-9850H@2.6GHz
memory: single channel 32GB@2666MHz (memory bandwidth 21.3GB/s)
compute_type is FP32

	first token	next token
gptj-6B_q4j_b128	382.57 ms (63.76 ms per token)	220.18 ms per token
llama-7B_q4j_b128	515.35 ms (73.62 ms per token)	250.71 ms per token
llama2-7B_q4j_b128	518.14 ms (74.02 ms per token)	253.23 ms per token

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_base.hpp

intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_compare.hpp

airMeng · 2023-10-19T00:01:32Z

I think you already remove all warnings in this PR?

airMeng · 2023-10-19T00:03:11Z

test on i7-9850H@2.6Hz

first token next token
gptj-6B_q4j_b128 382.57 ms (63.76 ms per token) 220.18 ms per token
llama-7B_q4j_b128 515.35 ms (73.62 ms per token) 250.71 ms per token
llama2-7B_q4j_b128 518.14 ms (74.02 ms per token) 253.23 ms per token

Can you provide memory bandwidth comparison between clients and SPR? It's useful to judge whether the gap is meaningful.

a32543254 · 2023-10-19T01:53:32Z

intel_extension_for_transformers/llm/runtime/graph/README.md

@@ -61,7 +61,7 @@ cd build
 cmake ..
 cmake --build . -j
 ```
-
+Note: add compile args ```-DNE_AVX512=OFF -DNE_AVX512_VBMI=OFF -DNE_AVX512_VNNI=OFF``` to ```cmake``` when compiling it on a CPU without AVX512


could we auto detect the machine's isa to add follow compile args without manually ? some consumers may not sure about their machine's isa

fix in PR511

I don't think so. If I remember correctly, there was a similar discussion for the deprecated executor, and the result is that it is not possible to find the hardware info of the runtime match during compile time on the machine that compiles the executable.
cc @luoyu-intel

intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_load.cpp

intel_extension_for_transformers/llm/runtime/graph/models/model_utils/model_utils.cpp

yuchengliu1 · 2023-10-19T04:00:34Z

I think you already remove all warnings in this PR?

Yes, there is no warning now.

a32543254

LGTM

a32543254 · 2023-10-19T05:01:20Z

Better we also give the same models's performance between ITREX AVX2 and GGML AVX2 to find our benefit or gap.

yuchengliu1 · 2023-10-20T03:58:27Z

Better we also give the same models's performance between ITREX AVX2 and GGML AVX2 to find our benefit or gap.

llama2-7B	first token	next token
llama.cpp	494.15 ms (61.77 ms per token)	141.69 ms per token
ITREX q4_0	743.89ms (106.27ms per token)	136.37ms per token
ITREX q4_j	518.14 ms (74.02 ms per token)	253.23 ms per token

the profiling of ITREX AVX2 (profile on q4_j without ffn fusion)

perf_total_per_op_us[ ADD] = 0.184 ms
perf_total_per_op_us[ MUL] = 0.327 ms
perf_total_per_op_us[ SILU] = 3.488 ms
perf_total_per_op_us[ RMS_NORM] = 0.396 ms
perf_total_per_op_us[ MUL_MAT] = 2.497 ms
perf_total_per_op_us[ SCALE] = 0.032 ms
perf_total_per_op_us[ CPY] = 1.177 ms
perf_total_per_op_us[ RESHAPE] = 0.037 ms
perf_total_per_op_us[ VIEW] = 0.066 ms
perf_total_per_op_us[ PERMUTE] = 0.029 ms
perf_total_per_op_us[ TRANSPOSE] = 0.006 ms
perf_total_per_op_us[ GET_ROWS] = 0.008 ms
perf_total_per_op_us[ DIAG_MASK_INF] = 0.025 ms
perf_total_per_op_us[ SOFT_MAX] = 0.166 ms
perf_total_per_op_us[ ROPE] = 2.940 ms
perf_total_per_op_us[ INNER PRODUCT] = 254.974 ms

the profiling of GGML AVX2 (profile on q4_0)

perf_total_per_op_us[ ADD] = 0.119 ms
perf_total_per_op_us[ MUL] = 0.244 ms
perf_total_per_op_us[ SILU] = 3.395 ms
perf_total_per_op_us[ RMS_NORM] = 0.345 ms
perf_total_per_op_us[ MUL_MAT] = 2.281 ms
perf_total_per_op_us[ SCALE] = 0.024 ms
perf_total_per_op_us[ CPY] = 0.976 ms
perf_total_per_op_us[ RESHAPE] = 0.032 ms
perf_total_per_op_us[ VIEW] = 0.063 ms
perf_total_per_op_us[ PERMUTE] = 0.023 ms
perf_total_per_op_us[ TRANSPOSE] = 0.017 ms
perf_total_per_op_us[ GET_ROWS] = 0.004 ms
perf_total_per_op_us[ DIAG_MASK_INF] = 0.015 ms
perf_total_per_op_us[ SOFT_MAX] = 0.117 ms
perf_total_per_op_us[ ROPE] = 0.271 ms
perf_total_per_op_us[ INNER PRODUCT] = 123.976 ms

* support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com>

* [CPP Graph] Opt qbits dequant (#465) * use INC 2.3.1 Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> * use INC 2.3.1 (#500) Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> * [RUNTIME] Enabing streaming llm for Runtime (#501) * Support StreamingLLM on CPU Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * Reduce the UT evaluation time (#498) Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> Signed-off-by: Wang, Chang <chang1.wang@intel.com> Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com> * Minor fix (#507) * Fix ChatGLM2 model loading issue (#510) * Fix ChatGLM2 model loading issue Signed-off-by: lvliang-intel <liang1.lv@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Remove OneDNN env setint for BF16 inference (#509) Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: VincyZhang <wenxin.zhang@intel.com> * support Avx2 (#493) * support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com> * add neuralchat ut for audio util (#466) * reduce ut time consumption (#499) Signed-off-by: Xin He <xin3.he@intel.com> * update python api readme (#504) * Add docker setup session for neuralchat finetuning sample (#496) * Update README.md to new added docker setup session Signed-off-by: Louie Tsai <louie.tsai@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md for fast token issue (#515) Signed-off-by: Louie Tsai <louie.tsai@intel.com> * Fix typo in README.md (#516) convertion -> conversion Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * improve Avx2 (#511) * Revert "update python api readme (#504)" This reverts commit 5f4175a. * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md (#519) Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> * docs: fix typos in question answering of pytorch (#520) Signed-off-by: Surav Shrestha <suravshresth@gmail.com> * fixed typos (#522) * Updated README.md (#517) Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> * update python api readme Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * fix readme Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Add Data type description Align Doc and help info Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> * align Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> * fix eos token id Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> --------- Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Wang, Chang <chang1.wang@intel.com> Signed-off-by: lvliang-intel <liang1.lv@intel.com> Signed-off-by: Haihao Shen <haihao.shen@intel.com> Signed-off-by: Xin He <xin3.he@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> Signed-off-by: Surav Shrestha <suravshresth@gmail.com> Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> Signed-off-by: Dong, Bo <bo1.dong@intel.com> Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> Co-authored-by: Wang, Zhe <zhe1.wang@intel.com> Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> Co-authored-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Haihao Shen <haihao.shen@intel.com> Co-authored-by: yuchengliu1 <yucheng.liu@intel.com> Co-authored-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: Liangyx2 <106130696+Liangyx2@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> Co-authored-by: Surav Shrestha <148626286+shresthasurav@users.noreply.github.com> Co-authored-by: Smoothieewastaken <86610201+Smoothieewastaken@users.noreply.github.com> Co-authored-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> Co-authored-by: Dong, Bo <bo1.dong@intel.com> Co-authored-by: Hengyu Meng <hengyu.meng@intel.com>

* support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com>

yuchengliu1 and others added 10 commits October 18, 2023 16:41

support

2cdfa89

success to run

42a747c

support Memcpy2D

9f82a8a

support gelu fusion

800d3a2

remove warnings

cfd732c

llama support

aa12823

adjust fusion

ebf04b4

beauty code

9a695bf

rebase fix

1accfae

update api var

1ea2e45

yuchengliu1 requested a review from airMeng as a code owner October 18, 2023 09:02

yuchengliu1 mentioned this pull request Oct 18, 2023

support Avx2 #374

Closed

update readme

11c117c

DDEle reviewed Oct 18, 2023

View reviewed changes

intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_base.hpp Show resolved Hide resolved

airMeng reviewed Oct 19, 2023

View reviewed changes

intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_compare.hpp Show resolved Hide resolved

a32543254 reviewed Oct 19, 2023

View reviewed changes

intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_load.cpp Show resolved Hide resolved

a32543254 reviewed Oct 19, 2023

View reviewed changes

intel_extension_for_transformers/llm/runtime/graph/models/model_utils/model_utils.cpp Show resolved Hide resolved

airMeng approved these changes Oct 19, 2023

View reviewed changes

a32543254 approved these changes Oct 19, 2023

View reviewed changes

kevinintel merged commit ea69f9a into main Oct 20, 2023
11 checks passed

kevinintel deleted the avx2 branch October 20, 2023 03:14

zhenwei-intel pushed a commit that referenced this pull request Oct 23, 2023

support Avx2 (#493)

4aa411d

* support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com>

VincyZhang pushed a commit that referenced this pull request Oct 23, 2023

support Avx2 (#493)

a05ff4b

* support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com>

DDEle mentioned this pull request Oct 24, 2023

What is the system requirement to run the sample code? #531

Closed

airMeng mentioned this pull request Nov 6, 2023

[LLM Runtime] integrate AVX_VNNI #565

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support Avx2 #493

support Avx2 #493

yuchengliu1 commented Oct 18, 2023 •

edited

Loading

airMeng commented Oct 19, 2023

airMeng commented Oct 19, 2023

a32543254 Oct 19, 2023 •

edited

Loading

yuchengliu1 Oct 20, 2023

DDEle Oct 20, 2023

yuchengliu1 commented Oct 19, 2023

a32543254 left a comment

a32543254 commented Oct 19, 2023 •

edited

Loading

yuchengliu1 commented Oct 20, 2023 •

edited

Loading

support Avx2 #493

support Avx2 #493

Conversation

yuchengliu1 commented Oct 18, 2023 • edited Loading

Type of Change

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

airMeng commented Oct 19, 2023

airMeng commented Oct 19, 2023

a32543254 Oct 19, 2023 • edited Loading

Choose a reason for hiding this comment

yuchengliu1 Oct 20, 2023

Choose a reason for hiding this comment

DDEle Oct 20, 2023

Choose a reason for hiding this comment

yuchengliu1 commented Oct 19, 2023

a32543254 left a comment

Choose a reason for hiding this comment

a32543254 commented Oct 19, 2023 • edited Loading

yuchengliu1 commented Oct 20, 2023 • edited Loading

yuchengliu1 commented Oct 18, 2023 •

edited

Loading

a32543254 Oct 19, 2023 •

edited

Loading

a32543254 commented Oct 19, 2023 •

edited

Loading

yuchengliu1 commented Oct 20, 2023 •

edited

Loading