Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Commit

Permalink
update python api readme (#523)
Browse files Browse the repository at this point in the history
* [CPP Graph] Opt qbits dequant (#465)

* use INC 2.3.1

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>

* use INC 2.3.1 (#500)

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>

* [RUNTIME] Enabing streaming llm for Runtime (#501)

* Support StreamingLLM on CPU

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

* Reduce the UT evaluation time (#498)

Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com>

* Minor fix (#507)

* Fix ChatGLM2 model loading issue (#510)

* Fix ChatGLM2 model loading issue

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Remove OneDNN env setint for BF16 inference (#509)

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: VincyZhang <wenxin.zhang@intel.com>

* support Avx2 (#493)

* support Memcpy2D

* support gelu fusion

---------

Co-authored-by: luoyu-intel <yu.luo@intel.com>

* add neuralchat ut for audio util (#466)

* reduce ut time consumption (#499)

Signed-off-by: Xin He <xin3.he@intel.com>

* update python api readme (#504)

* Add docker setup session for neuralchat finetuning sample (#496)

* Update README.md to new added docker setup session

Signed-off-by: Louie Tsai <louie.tsai@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md for fast token issue (#515)

Signed-off-by: Louie Tsai <louie.tsai@intel.com>

* Fix typo in README.md (#516)

convertion -> conversion

Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* improve Avx2  (#511)

* Revert "update python api readme (#504)"

This reverts commit 5f4175a.

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md (#519)

Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com>

* docs: fix typos in question answering of pytorch (#520)

Signed-off-by: Surav Shrestha <suravshresth@gmail.com>

* fixed typos (#522)

* Updated README.md (#517)

Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com>

* update python api readme

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

* fix readme

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

* Update README.md

Signed-off-by: Dong, Bo <bo1.dong@intel.com>

* Update README.md

Signed-off-by: Dong, Bo <bo1.dong@intel.com>

* Update README.md

Signed-off-by: Dong, Bo <bo1.dong@intel.com>

* Update README.md

Signed-off-by: Dong, Bo <bo1.dong@intel.com>

* Add Data type description
Align Doc and help info

Signed-off-by: Hengyu Meng <hengyu.meng@intel.com>

* align

Signed-off-by: Hengyu Meng <hengyu.meng@intel.com>

* fix eos token id

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

---------

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: Haihao Shen <haihao.shen@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com>
Signed-off-by: Surav Shrestha <suravshresth@gmail.com>
Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com>
Signed-off-by: Dong, Bo <bo1.dong@intel.com>
Signed-off-by: Hengyu Meng <hengyu.meng@intel.com>
Co-authored-by: Wang, Zhe <zhe1.wang@intel.com>
Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com>
Co-authored-by: Wang, Chang <chang1.wang@intel.com>
Co-authored-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Haihao Shen <haihao.shen@intel.com>
Co-authored-by: yuchengliu1 <yucheng.liu@intel.com>
Co-authored-by: luoyu-intel <yu.luo@intel.com>
Co-authored-by: Liangyx2 <106130696+Liangyx2@users.noreply.github.com>
Co-authored-by: xinhe <xin3.he@intel.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com>
Co-authored-by: Surav Shrestha <148626286+shresthasurav@users.noreply.github.com>
Co-authored-by: Smoothieewastaken <86610201+Smoothieewastaken@users.noreply.github.com>
Co-authored-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com>
Co-authored-by: Dong, Bo <bo1.dong@intel.com>
Co-authored-by: Hengyu Meng <hengyu.meng@intel.com>
  • Loading branch information
18 people authored Oct 23, 2023
1 parent 1ab6ce3 commit 0ce90db
Show file tree
Hide file tree
Showing 118 changed files with 1,637 additions and 1,308 deletions.
19 changes: 10 additions & 9 deletions .github/workflows/script/unitTest/env_setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,16 @@ if [ ${inc} != 0 ]; then
fi

echo "Install neural_compressor binary..."
n=0
until [ "$n" -ge 5 ]; do
git clone https://github.com/intel/neural-compressor.git /neural-compressor
cd /neural-compressor
pip install -r requirements.txt
python setup.py install && break
n=$((n + 1))
sleep 5
done
pip install neural-compressor
#n=0
#until [ "$n" -ge 5 ]; do
# git clone https://github.com/intel/neural-compressor.git /neural-compressor
# cd /neural-compressor
# pip install -r requirements.txt
# python setup.py install && break
# n=$((n + 1))
# sleep 5
#done

# Install test requirements
cd /intel-extension-for-transformers/tests
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/unit-test-neuralchat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ jobs:
podman run -dit --disable-content-trust --privileged --name=${{ env.CONTAINER_NAME }} -v /dev/shm:/dev/shm \
-v ${{ github.workspace }}:/intel-extension-for-transformers \
-v ~/.cache/oneAPI:/cache \
-v /models:/models \
-v /media:/media \
${{ env.REPO_NAME }}:${{ env.REPO_TAG }}
- name: Env build
Expand Down Expand Up @@ -143,4 +145,4 @@ jobs:
with:
name: Neural Chat Unit Test
path: ${{ github.workspace }}/log_dir
retention-days: 5
retention-days: 5
47 changes: 24 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,11 @@ Intel® Extension for Transformers
</div>

## 🚀Latest News
* <b>NeuralChat has been showcased in [Intel Innovation’23 Keynote](https://www.youtube.com/watch?v=RbKRELWP9y8&t=2954s) and [Google Cloud Next'23](https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next-23) to demonstrate GenAI/LLM capabilities on Intel Xeon Scalable Processors.</b>
* <b>NeuralChat supports custom chatbot development and deployment on broad Intel HWs such as Xeon Scalable Processors, Gaudi2, Xeon CPU Max Series, Data Center GPU Max Series, Arc Series, and Core Processors. Check out [Notebooks](./intel_extension_for_transformers/neural_chat/docs/full_notebooks.md) and see below sample code. </b>

```python
# pip install intel-extension-for-transformers
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
```

* <b>LLM runtime extends Hugging Face Transformers API to provide seamless low precision inference for popular LLMs, supporting mainstream low precision data types such as INT8/FP8/INT4/FP4/NF4.</b>
* [2023/10] LLM runtime, an Intel-optimized [GGML](https://github.com/ggerganov/ggml) compatible runtime, demonstrates **up to 15x performance gain in 1st token generation and 1.5x in other token generation** over the default [llama.cpp](https://github.com/ggerganov/llama.cpp).
* [2023/10] LLM runtime now supports LLM inference with **infinite-length inputs up to 4 million tokens**, inspired from [StreamingLLM](https://arxiv.org/abs/2309.17453).
* [2023/09] NeuralChat has been showcased in [**Intel Innovation’23 Keynote**](https://www.youtube.com/watch?v=RbKRELWP9y8&t=2954s) and [Google Cloud Next'23](https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next-23) to demonstrate GenAI/LLM capabilities on Intel Xeon Scalable Processors.
* [2023/08] NeuralChat supports **custom chatbot development and deployment within minutes** on broad Intel HWs such as Xeon Scalable Processors, Gaudi2, Xeon CPU Max Series, Data Center GPU Max Series, Arc Series, and Core Processors. Check out [Notebooks](./intel_extension_for_transformers/neural_chat/docs/full_notebooks.md).
* [2023/07] LLM runtime extends Hugging Face Transformers API to provide seamless low precision inference for popular LLMs, supporting low precision data types such as INT3/INT4/FP4/NF4/INT5/INT8/FP8.

---
<div align="left">
Expand All @@ -34,25 +28,31 @@ pip install intel-extension-for-transformers
> For more installation methods, please refer to [Installation Page](./docs/installation.md)
## 🌟Introduction
Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)). The toolkit provides the below key features and examples:

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular, effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)). The toolkit provides the below key features and examples:

* Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor)


* Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper [Fast Distilbert on CPUs](https://arxiv.org/abs/2211.07715) and [QuaLA-MiniLM: a Quantized Length Adaptive MiniLM](https://arxiv.org/abs/2210.17114), and NeurIPS 2021's paper [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754))


* Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list) and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa)
* Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa)

* [NeuralChat](intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins [Knowledge Retrieval](./intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md), [Speech Interaction](./intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md), [Query Caching](./intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/README.md), [Security Guardrail](./intel_extension_for_transformers/neural_chat/pipeline/plugins/security/README.md).


* [Inference](intel_extension_for_transformers/llm/runtime/graph) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels, supporting [GPT-NEOX](intel_extension_for_transformers/llm/runtime/graph/models/gptneox), [LLAMA](intel_extension_for_transformers/llm/runtime/graph/models/llama), [MPT](intel_extension_for_transformers/llm/runtime/graph/models/mpt), [FALCON](intel_extension_for_transformers/llm/runtime/graph/models/falcon), [BLOOM-7B](intel_extension_for_transformers/llm/runtime/graph/models/bloom), [OPT](intel_extension_for_transformers/llm/runtime/graph/models/opt), [ChatGLM2-6B](intel_extension_for_transformers/llm/runtime/graph/models/chatglm), [GPT-J-6B](intel_extension_for_transformers/llm/runtime/graph/models/gptj) and [Dolly-v2-3B](intel_extension_for_transformers/llm/runtime/graph/models/gptneox)


## 🌱Getting Started
Below are the sample code to enable weight-only low precision inference. See more [examples](intel_extension_for_transformers/llm/runtime/graph).
Below is the sample code to enable the chatbot. See more [examples](intel_extension_for_transformers/neural_chat/docs/full_notebooks.md).

### Chatbot
```python
# pip install intel-extension-for-transformers
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
```

Below is the sample code to enable weight-only INT4/INT8 inference. See more [examples](intel_extension_for_transformers/llm/runtime/graph).

### INT4 Inference
```python
Expand Down Expand Up @@ -90,7 +90,7 @@ outputs = tokenizer.batch_decode(gen_tokens)

## 🎯Validated Models
Here is the average accuracy of validated models on Lambada (OpenAI), HellaSwag, Winogrande, PIQA, and WikiText.
The next token latency is based on 32 input tokens and greedy search on Intel's 4th Generation Xeon Scalable Sapphire Rapids processor.
The subsequent token latency is based on 32 input tokens and greedy search on Intel's 4th Generation Xeon Scalable Sapphire Rapids processor.

| Model | FP32 | INT4 Accuracy (Group size 32) | INT4 Accuracy (Group size 128) | Next Token Latency |
|---------------------|:----------------------:|:-----------------------:|:----------------------------:|:------------:|
Expand Down Expand Up @@ -136,8 +136,9 @@ Find other models like ChatGLM, ChatGLM2, StarCoder... in [LLM Runtime](./intel_
</tr>
<tr>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/README.md">LLM Runtime</a></td>
<td colspan="3" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/core/README.md">Low Precision Kernels</a></td>
<td colspan="3" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/tensor_parallelism.md">Tensor Parallelism</a></td>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/README.md#2-run-llm-with-python-api">Streaming LLM</a></td>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/core/README.md">Low Precision Kernels</a></td>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/tensor_parallelism.md">Tensor Parallelism</a></td>
</tr>
<tr>
<th colspan="8" align="center">LLM COMPRESSION</th>
Expand Down Expand Up @@ -204,10 +205,10 @@ Find other models like ChatGLM, ChatGLM2, StarCoder... in [LLM Runtime](./intel_


## Acknowledgements
* Excellent open-source projects: [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [FastChat](https://github.com/lm-sys/FastChat), [fastRAG](https://github.com/IntelLabs/fastRAG), [ggml](https://github.com/ggerganov/ggml), [gptq](https://github.com/IST-DASLab/gptq), [llama.cpp](https://github.com/ggerganov/llama.cpp), [lm-evauation-harness](https://github.com/EleutherAI/lm-evaluation-harness), [peft](https://github.com/huggingface/peft), [trl](https://github.com/huggingface/trl), and many others.
* Excellent open-source projects: [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [FastChat](https://github.com/lm-sys/FastChat), [fastRAG](https://github.com/IntelLabs/fastRAG), [ggml](https://github.com/ggerganov/ggml), [gptq](https://github.com/IST-DASLab/gptq), [llama.cpp](https://github.com/ggerganov/llama.cpp), [lm-evauation-harness](https://github.com/EleutherAI/lm-evaluation-harness), [peft](https://github.com/huggingface/peft), [trl](https://github.com/huggingface/trl), [streamingllm](https://github.com/mit-han-lab/streaming-llm) and many others.

* Thanks to all the contributors including [Ikko Eltociear Ashimine](https://github.com/eltociear), [Hardik Kamboj](https://github.com/hardikkamboj), [Sangjune Park](https://github.com/JJukE), [Kevin Ta](https://github.com/kta-intel), [Huiyan Cao](https://github.com/huiyan2021), [Xigui Wang](https://github.com/xiguiw), [Jiafu Zhang](https://github.com/jiafuzha), [Tyler Titsworth](https://github.com/tylertitsworth), [Yi Wang](https://github.com/sywangyi), [Samanway Sadhu](https://github.com/SamanwaySadhu), [Jiqing Feng](https://github.com/jiqing-feng), [Jonathan Mamou](https://github.com/jmamou) and [Niroop Ammbashankar](https://github.com/nammbash).

## 💁Collaborations

Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach [us](mailto:itrex.maintainers@intel.com) and look forward to our collaborations on Intel Extension for Transformers!
Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach [us](mailto:itrex.maintainers@intel.com), and we look forward to our collaborations on Intel Extension for Transformers!
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
torchvision
onnx>=1.12
onnxruntime==1.13.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
torchvision
onnx>=1.12
onnxruntime==1.13.1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
transformers
torch==2.0.1
torch==2.1.0
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Modify the `user.conf` when you run different models:

+ When you run minilm, please also add `--minilm=true` for both performance and accuracy.

+ When you run benchmark on `SPR` machine, please add `--inter_parallel=28 and set --INST_NUM=28` for both perfomance and accuracy.
+ When you run benchmark on `SPR` machine, please add `--inter_parallel=28 and set --INST_NUM=28` for both performance and accuracy.

+ When you run bert large please keep batch size as 4.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ graph.save('./ir')
```

# Benchmark
If you want to run local onnx model inference, we provide with python API and C++ API. To use C++ API, you need to transfer to model ir fisrt.
If you want to run local onnx model inference, we provide with python API and C++ API. To use C++ API, you need to transfer to model ir first.

By setting ``--dynamic_quanzite`` for FP32 model, you could benchmark dynamic quantize int8 model.
## Accuracy
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.12.1

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ NOTES: ** the multiplication and addition operation amount when model inference
</tr>
<tr>
<td>IRQ Balance</td>
<td>Eabled</td>
<td>Enabled</td>
</tr>
<tr>
<td>CPU Model</td>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
transformers
datasets
torchprofile
torch==2.0.1
torch==2.1.0
intel_extension_for_pytorch
accelerate
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ python run_qa.py \

## Step 2: Distributed Data Parallel Training

We supporte Distributed Data Parallel training on single node and multi nodes settings for pruning. To use Distributed Data Parallel to speedup training, the bash command needs a small adjustment.
We support Distributed Data Parallel training on single node and multi nodes settings for pruning. To use Distributed Data Parallel to speedup training, the bash command needs a small adjustment.
<br>
*`<MASTER_ADDRESS>`* is the address of the master node, it won't be necessary for single node case,
<br>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
datasets >= 1.8.0
torch==2.0.1
torch==2.1.0
transformers
wandb
accelerate
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
accelerate
datasets
transformers
torch==2.0.1
torch==2.1.0
neural-compressor==2.0
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ sentencepiece != 0.1.92
rouge-score
nltk
py7zr
torch==2.0.1
torch==2.1.0
transformers
protobuf
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torch==2.0.1
torch==2.1.0
numpy
transformers
datasets
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torch==2.0.1
torch==2.1.0
transformers
datasets
allennlp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ sentencepiece
scipy
scikit-learn
protobuf
torch==2.0.1
torch==2.1.0
evaluate
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
accelerate
torch==2.0.1
torch==2.1.0
datasets >= 1.1.3
sentencepiece != 0.1.92
transformers
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ accelerate
datasets >= 1.1.3
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
transformers
wandb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ accelerate
datasets >= 1.1.3
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
transformers
wandb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ accelerate
datasets >= 1.1.3
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
transformers
wandb
Loading

0 comments on commit 0ce90db

Please sign in to comment.