update python api readme (#523)

* [CPP Graph] Opt qbits dequant (#465) * use INC 2.3.1 Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> * use INC 2.3.1 (#500) Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> * [RUNTIME] Enabing streaming llm for Runtime (#501) * Support StreamingLLM on CPU Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * Reduce the UT evaluation time (#498) Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> Signed-off-by: Wang, Chang <chang1.wang@intel.com> Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com> * Minor fix (#507) * Fix ChatGLM2 model loading issue (#510) * Fix ChatGLM2 model loading issue Signed-off-by: lvliang-intel <liang1.lv@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Remove OneDNN env setint for BF16 inference (#509) Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: VincyZhang <wenxin.zhang@intel.com> * support Avx2 (#493) * support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <yu.luo@intel.com> * add neuralchat ut for audio util (#466) * reduce ut time consumption (#499) Signed-off-by: Xin He <xin3.he@intel.com> * update python api readme (#504) * Add docker setup session for neuralchat finetuning sample (#496) * Update README.md to new added docker setup session Signed-off-by: Louie Tsai <louie.tsai@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md for fast token issue (#515) Signed-off-by: Louie Tsai <louie.tsai@intel.com> * Fix typo in README.md (#516) convertion -> conversion Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * improve Avx2 (#511) * Revert "update python api readme (#504)" This reverts commit 5f4175a. * Update README.md Signed-off-by: Haihao Shen <haihao.shen@intel.com> * Update README.md (#519) Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> * docs: fix typos in question answering of pytorch (#520) Signed-off-by: Surav Shrestha <suravshresth@gmail.com> * fixed typos (#522) * Updated README.md (#517) Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> * update python api readme Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * fix readme Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Update README.md Signed-off-by: Dong, Bo <bo1.dong@intel.com> * Add Data type description Align Doc and help info Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> * align Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> * fix eos token id Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> --------- Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com> Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Wang, Chang <chang1.wang@intel.com> Signed-off-by: lvliang-intel <liang1.lv@intel.com> Signed-off-by: Haihao Shen <haihao.shen@intel.com> Signed-off-by: Xin He <xin3.he@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> Signed-off-by: Surav Shrestha <suravshresth@gmail.com> Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> Signed-off-by: Dong, Bo <bo1.dong@intel.com> Signed-off-by: Hengyu Meng <hengyu.meng@intel.com> Co-authored-by: Wang, Zhe <zhe1.wang@intel.com> Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> Co-authored-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Haihao Shen <haihao.shen@intel.com> Co-authored-by: yuchengliu1 <yucheng.liu@intel.com> Co-authored-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: Liangyx2 <106130696+Liangyx2@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com> Co-authored-by: Surav Shrestha <148626286+shresthasurav@users.noreply.github.com> Co-authored-by: Smoothieewastaken <86610201+Smoothieewastaken@users.noreply.github.com> Co-authored-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com> Co-authored-by: Dong, Bo <bo1.dong@intel.com> Co-authored-by: Hengyu Meng <hengyu.meng@intel.com>
intel · Oct 23, 2023 · 0ce90db · 0ce90db
1 parent 1ab6ce3
commit 0ce90db
Show file tree

Hide file tree

Showing 118 changed files with 1,637 additions and 1,308 deletions.
diff --git a/.github/workflows/script/unitTest/env_setup.sh b/.github/workflows/script/unitTest/env_setup.sh
@@ -7,15 +7,16 @@ if [ ${inc} != 0 ]; then
 fi
 
 echo "Install neural_compressor binary..."
-n=0
-until [ "$n" -ge 5 ]; do
-    git clone https://github.com/intel/neural-compressor.git /neural-compressor
-    cd /neural-compressor
-    pip install -r requirements.txt
-    python setup.py install && break
-    n=$((n + 1))
-    sleep 5
-done
+pip install neural-compressor
+#n=0
+#until [ "$n" -ge 5 ]; do
+#    git clone https://github.com/intel/neural-compressor.git /neural-compressor
+#    cd /neural-compressor
+#    pip install -r requirements.txt
+#    python setup.py install && break
+#    n=$((n + 1))
+#    sleep 5
+#done
 
 # Install test requirements
 cd /intel-extension-for-transformers/tests

diff --git a/.github/workflows/unit-test-neuralchat.yml b/.github/workflows/unit-test-neuralchat.yml
@@ -73,6 +73,8 @@ jobs:
           podman run -dit --disable-content-trust --privileged --name=${{ env.CONTAINER_NAME }} -v /dev/shm:/dev/shm \
           -v ${{ github.workspace }}:/intel-extension-for-transformers \
           -v ~/.cache/oneAPI:/cache \
+          -v /models:/models \
+          -v /media:/media \
           ${{ env.REPO_NAME }}:${{ env.REPO_TAG }}
 
       - name: Env build
@@ -143,4 +145,4 @@ jobs:
         with:
           name: Neural Chat Unit Test
           path: ${{ github.workspace }}/log_dir
-          retention-days: 5
+          retention-days: 5
diff --git a/README.md b/README.md
@@ -11,17 +11,11 @@ Intel® Extension for Transformers
 </div>
 
 ## 🚀Latest News
-* <b>NeuralChat has been showcased in [Intel Innovation’23 Keynote](https://www.youtube.com/watch?v=RbKRELWP9y8&t=2954s) and [Google Cloud Next'23](https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next-23) to demonstrate GenAI/LLM capabilities on Intel Xeon Scalable Processors.</b>
-* <b>NeuralChat supports custom chatbot development and deployment on broad Intel HWs such as Xeon Scalable Processors, Gaudi2, Xeon CPU Max Series, Data Center GPU Max Series, Arc Series, and Core Processors. Check out [Notebooks](./intel_extension_for_transformers/neural_chat/docs/full_notebooks.md) and see below sample code. </b>
-
-```python
-# pip install intel-extension-for-transformers
-from intel_extension_for_transformers.neural_chat import build_chatbot
-chatbot = build_chatbot()
-response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
-```
-
-* <b>LLM runtime extends Hugging Face Transformers API to provide seamless low precision inference for popular LLMs, supporting mainstream low precision data types such as INT8/FP8/INT4/FP4/NF4.</b>
+* [2023/10] LLM runtime, an Intel-optimized [GGML](https://github.com/ggerganov/ggml) compatible runtime, demonstrates **up to 15x performance gain in 1st token generation and 1.5x in other token generation** over the default [llama.cpp](https://github.com/ggerganov/llama.cpp).
+* [2023/10] LLM runtime now supports LLM inference with **infinite-length inputs up to 4 million tokens**, inspired from [StreamingLLM](https://arxiv.org/abs/2309.17453).
+* [2023/09] NeuralChat has been showcased in [**Intel Innovation’23 Keynote**](https://www.youtube.com/watch?v=RbKRELWP9y8&t=2954s) and [Google Cloud Next'23](https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next-23) to demonstrate GenAI/LLM capabilities on Intel Xeon Scalable Processors.
+* [2023/08] NeuralChat supports **custom chatbot development and deployment within minutes** on broad Intel HWs such as Xeon Scalable Processors, Gaudi2, Xeon CPU Max Series, Data Center GPU Max Series, Arc Series, and Core Processors. Check out [Notebooks](./intel_extension_for_transformers/neural_chat/docs/full_notebooks.md).
+* [2023/07] LLM runtime extends Hugging Face Transformers API to provide seamless low precision inference for popular LLMs, supporting low precision data types such as INT3/INT4/FP4/NF4/INT5/INT8/FP8.
 
 ---
 <div align="left">
@@ -34,25 +28,31 @@ pip install intel-extension-for-transformers
 > For more installation methods, please refer to [Installation Page](./docs/installation.md)
 
 ## 🌟Introduction
-Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)). The toolkit provides the below key features and examples:
-
+Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular, effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)). The toolkit provides the below key features and examples:
 
 *  Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
 
-
 *  Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper [Fast Distilbert on CPUs](https://arxiv.org/abs/2211.07715) and [QuaLA-MiniLM: a Quantized Length Adaptive MiniLM](https://arxiv.org/abs/2210.17114), and NeurIPS 2021's paper [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754))
 
-
-*  Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list) and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa) 
+*  Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa) 
 
 *  [NeuralChat](intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins [Knowledge Retrieval](./intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md), [Speech Interaction](./intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md), [Query Caching](./intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/README.md), [Security Guardrail](./intel_extension_for_transformers/neural_chat/pipeline/plugins/security/README.md).
 
-
 *  [Inference](intel_extension_for_transformers/llm/runtime/graph) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels, supporting [GPT-NEOX](intel_extension_for_transformers/llm/runtime/graph/models/gptneox), [LLAMA](intel_extension_for_transformers/llm/runtime/graph/models/llama), [MPT](intel_extension_for_transformers/llm/runtime/graph/models/mpt), [FALCON](intel_extension_for_transformers/llm/runtime/graph/models/falcon), [BLOOM-7B](intel_extension_for_transformers/llm/runtime/graph/models/bloom), [OPT](intel_extension_for_transformers/llm/runtime/graph/models/opt), [ChatGLM2-6B](intel_extension_for_transformers/llm/runtime/graph/models/chatglm), [GPT-J-6B](intel_extension_for_transformers/llm/runtime/graph/models/gptj) and [Dolly-v2-3B](intel_extension_for_transformers/llm/runtime/graph/models/gptneox)
 
 
 ## 🌱Getting Started
-Below are the sample code to enable weight-only low precision inference. See more [examples](intel_extension_for_transformers/llm/runtime/graph).
+Below is the sample code to enable the chatbot. See more [examples](intel_extension_for_transformers/neural_chat/docs/full_notebooks.md).
+
+### Chatbot 
+```python
+# pip install intel-extension-for-transformers
+from intel_extension_for_transformers.neural_chat import build_chatbot
+chatbot = build_chatbot()
+response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
+```
+
+Below is the sample code to enable weight-only INT4/INT8 inference. See more [examples](intel_extension_for_transformers/llm/runtime/graph).
 
 ### INT4 Inference 
 ```python
@@ -90,7 +90,7 @@ outputs = tokenizer.batch_decode(gen_tokens)
 
 ## 🎯Validated  Models
 Here is the average accuracy of validated models on Lambada (OpenAI), HellaSwag, Winogrande, PIQA, and WikiText.
-The next token latency is based on 32 input tokens and greedy search on Intel's 4th Generation Xeon Scalable Sapphire Rapids processor.
+The subsequent token latency is based on 32 input tokens and greedy search on Intel's 4th Generation Xeon Scalable Sapphire Rapids processor.
 
 | Model |  FP32         | INT4 Accuracy (Group size 32) | INT4 Accuracy (Group size 128) | Next Token Latency   | 
 |---------------------|:----------------------:|:-----------------------:|:----------------------------:|:------------:| 
@@ -136,8 +136,9 @@ Find other models like ChatGLM, ChatGLM2, StarCoder... in [LLM Runtime](./intel_
   </tr>
  <tr>
     <td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/README.md">LLM Runtime</a></td>
-    <td colspan="3" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/core/README.md">Low Precision Kernels</a></td>
-    <td colspan="3" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/tensor_parallelism.md">Tensor Parallelism</a></td>
+    <td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/README.md#2-run-llm-with-python-api">Streaming LLM</a></td>
+    <td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/core/README.md">Low Precision Kernels</a></td>
+    <td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/tensor_parallelism.md">Tensor Parallelism</a></td>
   </tr>
   <tr>
     <th colspan="8" align="center">LLM COMPRESSION</th>
@@ -204,10 +205,10 @@ Find other models like ChatGLM, ChatGLM2, StarCoder... in [LLM Runtime](./intel_
 
 
 ## Acknowledgements
-* Excellent open-source projects: [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [FastChat](https://github.com/lm-sys/FastChat), [fastRAG](https://github.com/IntelLabs/fastRAG), [ggml](https://github.com/ggerganov/ggml), [gptq](https://github.com/IST-DASLab/gptq), [llama.cpp](https://github.com/ggerganov/llama.cpp), [lm-evauation-harness](https://github.com/EleutherAI/lm-evaluation-harness), [peft](https://github.com/huggingface/peft), [trl](https://github.com/huggingface/trl), and many others.
+* Excellent open-source projects: [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [FastChat](https://github.com/lm-sys/FastChat), [fastRAG](https://github.com/IntelLabs/fastRAG), [ggml](https://github.com/ggerganov/ggml), [gptq](https://github.com/IST-DASLab/gptq), [llama.cpp](https://github.com/ggerganov/llama.cpp), [lm-evauation-harness](https://github.com/EleutherAI/lm-evaluation-harness), [peft](https://github.com/huggingface/peft), [trl](https://github.com/huggingface/trl), [streamingllm](https://github.com/mit-han-lab/streaming-llm) and many others.
 
 * Thanks to all the contributors including [Ikko Eltociear Ashimine](https://github.com/eltociear), [Hardik Kamboj](https://github.com/hardikkamboj), [Sangjune Park](https://github.com/JJukE), [Kevin Ta](https://github.com/kta-intel), [Huiyan Cao](https://github.com/huiyan2021), [Xigui Wang](https://github.com/xiguiw), [Jiafu Zhang](https://github.com/jiafuzha), [Tyler Titsworth](https://github.com/tylertitsworth), [Yi Wang](https://github.com/sywangyi), [Samanway Sadhu](https://github.com/SamanwaySadhu), [Jiqing Feng](https://github.com/jiqing-feng), [Jonathan Mamou](https://github.com/jmamou) and [Niroop Ammbashankar](https://github.com/nammbash).
 
 ## 💁Collaborations
 
-Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach [us](mailto:itrex.maintainers@intel.com) and look forward to our collaborations on Intel Extension for Transformers!
+Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach [us](mailto:itrex.maintainers@intel.com), and we look forward to our collaborations on Intel Extension for Transformers!
diff --git a/examples/huggingface/pytorch/image-classification/deployment/imagenet/vit/requirements.txt b/examples/huggingface/pytorch/image-classification/deployment/imagenet/vit/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 torchvision
 onnx>=1.12
 onnxruntime==1.13.1

diff --git a/examples/huggingface/pytorch/image-classification/quantization/requirements.txt b/examples/huggingface/pytorch/image-classification/quantization/requirements.txt
@@ -3,7 +3,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 torchvision
 onnx>=1.12
 onnxruntime==1.13.1

diff --git a/...face/pytorch/language-modeling/deployment/fill-mask/electra_base_chinese/requirements.txt b/...face/pytorch/language-modeling/deployment/fill-mask/electra_base_chinese/requirements.txt
@@ -1,2 +1,2 @@
 transformers
-torch==2.0.1
+torch==2.1.0
diff --git a/...uggingface/pytorch/question-answering/deployment/squad/MLperf_example/README.md b/...uggingface/pytorch/question-answering/deployment/squad/MLperf_example/README.md
@@ -70,7 +70,7 @@ Modify the `user.conf` when you run different models:
 
 + When you run minilm, please also add `--minilm=true` for both performance and accuracy.
 
-+ When you run benchmark on `SPR` machine, please add `--inter_parallel=28 and set --INST_NUM=28` for both perfomance and accuracy.
++ When you run benchmark on `SPR` machine, please add `--inter_parallel=28 and set --INST_NUM=28` for both performance and accuracy.
 
 + When you run bert large please keep batch size as 4.
 

diff --git a/...es/huggingface/pytorch/question-answering/deployment/squad/bert_large/README.md b/...es/huggingface/pytorch/question-answering/deployment/squad/bert_large/README.md
@@ -72,7 +72,7 @@ graph.save('./ir')
 ```
 
 # Benchmark
-If you want to run local onnx model inference, we provide with python API and C++ API. To use C++ API, you need to transfer to model ir fisrt.
+If you want to run local onnx model inference, we provide with python API and C++ API. To use C++ API, you need to transfer to model ir first.
 
 By setting ``--dynamic_quanzite`` for FP32 model, you could benchmark dynamic quantize int8 model.
 ## Accuracy

diff --git a/examples/huggingface/pytorch/question-answering/deployment/squad/bert_large/requirements.txt b/examples/huggingface/pytorch/question-answering/deployment/squad/bert_large/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/.../pytorch/question-answering/deployment/squad/length_adaptive_transformer/requirements.txt b/.../pytorch/question-answering/deployment/squad/length_adaptive_transformer/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.12.1
 

diff --git a/examples/huggingface/pytorch/question-answering/dynamic/README.md b/examples/huggingface/pytorch/question-answering/dynamic/README.md
@@ -256,7 +256,7 @@ NOTES: ** the multiplication and addition operation amount when model inference
   </tr>
   <tr>
     <td>IRQ Balance</td>
-    <td>Eabled</td>
+    <td>Enabled</td>
   </tr>
   <tr>
     <td>CPU Model</td>

diff --git a/examples/huggingface/pytorch/question-answering/dynamic/requirements.txt b/examples/huggingface/pytorch/question-answering/dynamic/requirements.txt
@@ -1,6 +1,6 @@
 transformers
 datasets
 torchprofile
-torch==2.0.1
+torch==2.1.0
 intel_extension_for_pytorch
 accelerate
diff --git a/examples/huggingface/pytorch/question-answering/pruning/basic_magnitude/README.md b/examples/huggingface/pytorch/question-answering/pruning/basic_magnitude/README.md
@@ -28,7 +28,7 @@ python run_qa.py \
 
 ## Step 2: Distributed Data Parallel Training
 
-We supporte Distributed Data Parallel training on single node and multi nodes settings for pruning. To use Distributed Data Parallel to speedup training, the bash command needs a small adjustment.
+We support Distributed Data Parallel training on single node and multi nodes settings for pruning. To use Distributed Data Parallel to speedup training, the bash command needs a small adjustment.
 <br>
 *`<MASTER_ADDRESS>`* is the address of the master node, it won't be necessary for single node case,
 <br>

diff --git a/examples/huggingface/pytorch/question-answering/pruning/basic_magnitude/requirements.txt b/examples/huggingface/pytorch/question-answering/pruning/basic_magnitude/requirements.txt
@@ -1,5 +1,5 @@
 datasets >= 1.8.0
-torch==2.0.1
+torch==2.1.0
 transformers
 wandb
 accelerate
diff --git a/examples/huggingface/pytorch/question-answering/pruning/longformer_triviaqa/requirements.txt b/examples/huggingface/pytorch/question-answering/pruning/longformer_triviaqa/requirements.txt
@@ -1,5 +1,5 @@
 accelerate
 datasets
 transformers
-torch==2.0.1
+torch==2.1.0
 neural-compressor==2.0
diff --git a/examples/huggingface/pytorch/summarization/quantization/requirements.txt b/examples/huggingface/pytorch/summarization/quantization/requirements.txt
@@ -4,6 +4,6 @@ sentencepiece != 0.1.92
 rouge-score
 nltk
 py7zr
-torch==2.0.1
+torch==2.1.0
 transformers
 protobuf
diff --git a/examples/huggingface/pytorch/text-classification/cascade-models/requirements.txt b/examples/huggingface/pytorch/text-classification/cascade-models/requirements.txt
@@ -1,4 +1,4 @@
-torch==2.0.1
+torch==2.1.0
 numpy
 transformers
 datasets

diff --git a/...e/pytorch/text-classification/deployment/emotion/distilbert_base_uncased/requirements.txt b/...e/pytorch/text-classification/deployment/emotion/distilbert_base_uncased/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/examples/huggingface/pytorch/text-classification/deployment/mrpc/bert_base/requirements.txt b/examples/huggingface/pytorch/text-classification/deployment/mrpc/bert_base/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/.../huggingface/pytorch/text-classification/deployment/mrpc/bert_base_cased/requirements.txt b/.../huggingface/pytorch/text-classification/deployment/mrpc/bert_base_cased/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/examples/huggingface/pytorch/text-classification/deployment/mrpc/bert_mini/requirements.txt b/examples/huggingface/pytorch/text-classification/deployment/mrpc/bert_mini/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/...face/pytorch/text-classification/deployment/mrpc/distilbert_base_uncased/requirements.txt b/...face/pytorch/text-classification/deployment/mrpc/distilbert_base_uncased/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/...les/huggingface/pytorch/text-classification/deployment/mrpc/roberta_base/requirements.txt b/...les/huggingface/pytorch/text-classification/deployment/mrpc/roberta_base/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/...ples/huggingface/pytorch/text-classification/deployment/sparse/bert_mini/requirements.txt b/...ples/huggingface/pytorch/text-classification/deployment/sparse/bert_mini/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/...ce/pytorch/text-classification/deployment/sparse/distilbert_base_uncased/requirements.txt b/...ce/pytorch/text-classification/deployment/sparse/distilbert_base_uncased/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/examples/huggingface/pytorch/text-classification/deployment/sst2/bert_mini/requirements.txt b/examples/huggingface/pytorch/text-classification/deployment/sst2/bert_mini/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/...face/pytorch/text-classification/deployment/sst2/distilbert_base_uncased/requirements.txt b/...face/pytorch/text-classification/deployment/sst2/distilbert_base_uncased/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/...gface/pytorch/text-classification/deployment/sst2/minilm_l6_h384_uncased/requirements.txt b/...gface/pytorch/text-classification/deployment/sst2/minilm_l6_h384_uncased/requirements.txt
@@ -4,7 +4,7 @@ accelerate
 datasets >= 1.8.0
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 onnx>=1.12
 onnxruntime==1.13.1
 
diff --git a/examples/huggingface/pytorch/text-classification/early-exit/requirements.txt b/examples/huggingface/pytorch/text-classification/early-exit/requirements.txt
@@ -1,4 +1,4 @@
-torch==2.0.1
+torch==2.1.0
 transformers
 datasets
 allennlp

diff --git a/examples/huggingface/pytorch/text-classification/new_pruning/requirements.txt b/examples/huggingface/pytorch/text-classification/new_pruning/requirements.txt
@@ -6,5 +6,5 @@ sentencepiece
 scipy
 scikit-learn
 protobuf
-torch==2.0.1
+torch==2.1.0
 evaluate
diff --git a/examples/huggingface/pytorch/text-classification/orchestrate_optimizations/requirements.txt b/examples/huggingface/pytorch/text-classification/orchestrate_optimizations/requirements.txt
@@ -1,5 +1,5 @@
 accelerate
-torch==2.0.1
+torch==2.1.0
 datasets >= 1.1.3
 sentencepiece != 0.1.92
 transformers

diff --git a/examples/huggingface/pytorch/text-classification/pruning/requirements.txt b/examples/huggingface/pytorch/text-classification/pruning/requirements.txt
@@ -2,6 +2,6 @@ accelerate
 datasets >= 1.1.3
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 transformers
 wandb
diff --git a/examples/huggingface/pytorch/text-classification/quantization/ptq/requirements.txt b/examples/huggingface/pytorch/text-classification/quantization/ptq/requirements.txt
@@ -2,6 +2,6 @@ accelerate
 datasets >= 1.1.3
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 transformers
 wandb
diff --git a/examples/huggingface/pytorch/text-classification/quantization/qat/requirements.txt b/examples/huggingface/pytorch/text-classification/quantization/qat/requirements.txt
@@ -2,6 +2,6 @@ accelerate
 datasets >= 1.1.3
 sentencepiece != 0.1.92
 protobuf
-torch==2.0.1
+torch==2.1.0
 transformers
 wandb