-
Notifications
You must be signed in to change notification settings - Fork 210
Conversation
intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_base.hpp
Show resolved
Hide resolved
intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_compare.hpp
Show resolved
Hide resolved
I think you already remove all warnings in this PR? |
Can you provide memory bandwidth comparison between clients and SPR? It's useful to judge whether the gap is meaningful. |
@@ -61,7 +61,7 @@ cd build | |||
cmake .. | |||
cmake --build . -j | |||
``` | |||
|
|||
Note: add compile args ```-DNE_AVX512=OFF -DNE_AVX512_VBMI=OFF -DNE_AVX512_VNNI=OFF``` to ```cmake``` when compiling it on a CPU without AVX512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we auto detect the machine's isa to add follow compile args without manually ? some consumers may not sure about their machine's isa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix in PR511
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If I remember correctly, there was a similar discussion for the deprecated executor, and the result is that it is not possible to find the hardware info of the runtime match during compile time on the machine that compiles the executable.
cc @luoyu-intel
intel_extension_for_transformers/llm/runtime/graph/vectors/cpu/vec_load.cpp
Show resolved
Hide resolved
intel_extension_for_transformers/llm/runtime/graph/models/model_utils/model_utils.cpp
Show resolved
Hide resolved
Yes, there is no warning now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Better we also give the same models's performance between ITREX AVX2 and GGML AVX2 to find our benefit or gap. |
the profiling of ITREX AVX2 (profile on q4_j without ffn fusion) perf_total_per_op_us[ ADD] = 0.184 ms the profiling of GGML AVX2 (profile on q4_0) perf_total_per_op_us[ ADD] = 0.119 ms |
* support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com>
* [CPP Graph] Opt qbits dequant (#465) * use INC 2.3.1 Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> * use INC 2.3.1 (#500) Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> * [RUNTIME] Enabing streaming llm for Runtime (#501) * Support StreamingLLM on CPU Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * Reduce the UT evaluation time (#498) Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> Signed-off-by: Wang, Chang <chang1.wang@intel.com> Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com> * Minor fix (#507) * Fix ChatGLM2 model loading issue (#510) * Fix ChatGLM2 model loading issue Signed-off-by: lvliang-intel <liang1.lv@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Remove OneDNN env setint for BF16 inference (#509) Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: VincyZhang <wenxin.zhang@intel.com> * support Avx2 (#493) * support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com> * add neuralchat ut for audio util (#466) * reduce ut time consumption (#499) Signed-off-by: Xin He <xin3.he@intel.com> * update python api readme (#504) * Add docker setup session for neuralchat finetuning sample (#496) * Update README.md to new added docker setup session Signed-off-by: Louie Tsai <louie.tsai@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md for fast token issue (#515) Signed-off-by: Louie Tsai <louie.tsai@intel.com> * Fix typo in README.md (#516) convertion -> conversion Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * improve Avx2 (#511) * Revert "update python api readme (#504)" This reverts commit 5f4175a. * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md (#519) Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> * docs: fix typos in question answering of pytorch (#520) Signed-off-by: Surav Shrestha <suravshresth@gmail.com> * fixed typos (#522) * Updated README.md (#517) Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> * update python api readme Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * fix readme Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Add Data type description Align Doc and help info Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> * align Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> * fix eos token id Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> --------- Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Wang, Chang <chang1.wang@intel.com> Signed-off-by: lvliang-intel <liang1.lv@intel.com> Signed-off-by: Haihao Shen <haihao.shen@intel.com> Signed-off-by: Xin He <xin3.he@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> Signed-off-by: Surav Shrestha <suravshresth@gmail.com> Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> Signed-off-by: Dong, Bo <bo1.dong@intel.com> Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> Co-authored-by: Wang, Zhe <zhe1.wang@intel.com> Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> Co-authored-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Haihao Shen <haihao.shen@intel.com> Co-authored-by: yuchengliu1 <yucheng.liu@intel.com> Co-authored-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: Liangyx2 <106130696+Liangyx2@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> Co-authored-by: Surav Shrestha <148626286+shresthasurav@users.noreply.github.com> Co-authored-by: Smoothieewastaken <86610201+Smoothieewastaken@users.noreply.github.com> Co-authored-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> Co-authored-by: Dong, Bo <bo1.dong@intel.com> Co-authored-by: Hengyu Meng <hengyu.meng@intel.com>
* support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com>
Type of Change
feature
No API changed
Run LLM on client CPU with AVX2 (without avx512)
detail description
JIRA ticket: xxx
Expected Behavior & Potential Risk
CPU: i7-9850H@2.6GHz
memory: single channel 32GB@2666MHz (memory bandwidth 21.3GB/s)
compute_type is FP32
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed