Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PaddleNLP 3.0] Refactor/3 part1- remove fast tokenizer. #8613

Merged
merged 9 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
48 changes: 0 additions & 48 deletions .github/workflows/fast_tokenizer.yml

This file was deleted.

12 changes: 0 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,18 +279,6 @@ PaddleNLP针对信息抽取、语义检索、智能问答、情感分析等高

### 高性能分布式训练与推理

#### ⚡ FastTokenizer:高性能文本处理库

<div align="center">
<img src="https://user-images.githubusercontent.com/11793384/168407921-b4395b1d-44bd-41a0-8c58-923ba2b703ef.png" width="400">
</div>

```python
AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)
```

为了实现更极致的模型部署性能,安装FastTokenizer后只需在`AutoTokenizer` API上打开 `use_fast=True`选项,即可调用C++实现的高性能分词算子,轻松获得超Python百余倍的文本处理加速,更多使用说明可参考[FastTokenizer文档](./fast_tokenizer)。

#### ⚡️ FastGeneration:高性能生成加速库

<div align="center">
Expand Down
12 changes: 0 additions & 12 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,18 +224,6 @@ For more details please refer to [Speech Command Analysis](./applications/speech

### High Performance Distributed Training and Inference

#### ⚡ FastTokenizer: High Performance Text Preprocessing Library

<div align="center">
<img src="https://user-images.githubusercontent.com/11793384/168407921-b4395b1d-44bd-41a0-8c58-923ba2b703ef.png" width="400">
</div>

```python
AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)
```

Set `use_fast=True` to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to [FastTokenizer](./fast_tokenizer).

#### ⚡ FastGeneration: High Performance Generation Library

<div align="center">
Expand Down
1 change: 0 additions & 1 deletion applications/information_extraction/taskflow_doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,6 @@ OCR中识别出来的文字会按照左上到右下进行排序,对于分栏
* `layout_analysis`:是否使用PP-Structure对文档进行布局分析以优化布局信息的排序,默认为False。
* `position_prob`:模型对于span的起始位置/终止位置的结果概率在0~1之间,返回结果去掉小于这个阈值的结果,默认为0.5,span的最终概率输出为起始位置概率和终止位置概率的乘积。
* `precision`:选择模型精度,默认为`fp32`,可选有`fp16`和`fp32`。`fp16`推理速度更快,支持GPU和NPU硬件环境。如果选择`fp16`,在GPU硬件环境下,请先确保机器正确安装NVIDIA相关驱动和基础软件,**确保CUDA>=11.2,cuDNN>=8.1.1**,初次使用需按照提示安装相关依赖。其次,需要确保GPU设备的CUDA计算能力(CUDA Compute Capability)大于7.0,典型的设备包括V100、T4、A10、A100、GTX 20系列和30系列显卡等。更多关于CUDA Compute Capability和精度支持情况请参考NVIDIA文档:[GPU硬件与支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)。
* `use_fast`: 使用C++实现的高性能分词算子FastTokenizer进行文本预处理加速。需要通过`pip install fast-tokenizer-python`安装FastTokenizer库后方可使用。默认为`False`。更多使用说明可参考[FastTokenizer文档](../../fast_tokenizer)。

## References
- **[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)**
Expand Down
4 changes: 1 addition & 3 deletions applications/information_extraction/taskflow_doc_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,8 +286,7 @@ The text recognized in OCR will be sorted from top left to bottom right. For cas
model='uie-x-base',
layout_analysis=False,
position_prob=0.5,
precision='fp32',
use_fast=False)
precision='fp32')
```

* `schema`: Define the task extraction target, which can be configured by referring to the calling examples of different tasks in the out-of-the-box.
Expand All @@ -298,7 +297,6 @@ The text recognized in OCR will be sorted from top left to bottom right. For cas
* `layout_analysis`: Whether to use PP-Structure to analyze the layout of the document to optimize the sorting of layout information, the default is False.
* `position_prob`: The result probability of the model for the start position/end position of the span is between 0 and 1, and the returned result removes the results less than this threshold, the default is 0.5, and the final probability output of the span is the start position probability and end position The product of the position probabilities.
* `precision`: select the model precision, the default is `fp32`, optional `fp16` and `fp32`. `fp16` inference is faster, support GPU and NPU hardware. If you choose `fp16` and GPU hardware, please ensure that the machine is correctly installed with NVIDIA-related drivers and basic software. **Ensure that CUDA>=11.2, cuDNN>=8.1.1**. For the first time use, you need to follow the prompts to install the relevant dependencies. Secondly, it is necessary to ensure that the CUDA Compute Capability of the GPU device is greater than 7.0. Typical devices include V100, T4, A10, A100, GTX 20 series and 30 series graphics cards, etc. For more information about CUDA Compute Capability and precision support, please refer to NVIDIA documentation: [GPU Hardware and Supported Precision Comparison Table](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix).
* `use_fast`: Use the high-performance word segmentation operator FastTokenizer implemented in C++ to accelerate text preprocessing. The FastTokenizer library needs to be installed through `pip install fast-tokenizer-python` before it can be used. Defaults to `False`. For more usage instructions, please refer to [FastTokenizer Documentation](../../fast_tokenizer).

## References
- **[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)**
Expand Down
4 changes: 1 addition & 3 deletions applications/information_extraction/taskflow_text.md
Original file line number Diff line number Diff line change
Expand Up @@ -489,8 +489,7 @@ UIE不限定行业领域和抽取目标,以下是一些通过Taskflow实现开
batch_size=16,
model='uie-base',
position_prob=0.5,
precision='fp32',
use_fast=False)
precision='fp32')
```

* `schema`:定义任务抽取目标,可参考开箱即用中不同任务的调用示例进行配置。
Expand All @@ -499,4 +498,3 @@ UIE不限定行业领域和抽取目标,以下是一些通过Taskflow实现开
* `model`:选择任务使用的模型,默认为`uie-base`,可选有`uie-base`, `uie-medium`, `uie-mini`, `uie-micro`, `uie-nano`和`uie-medical-base`, `uie-base-en`,`uie-x-base`。
* `position_prob`:模型对于span的起始位置/终止位置的结果概率在0~1之间,返回结果去掉小于这个阈值的结果,默认为0.5,span的最终概率输出为起始位置概率和终止位置概率的乘积。
* `precision`:选择模型精度,默认为`fp32`,可选有`fp16`和`fp32`。`fp16`推理速度更快,支持GPU和NPU硬件环境。如果选择`fp16`,在GPU硬件环境下,请先确保机器正确安装NVIDIA相关驱动和基础软件,**确保CUDA>=11.2,cuDNN>=8.1.1**,初次使用需按照提示安装相关依赖。其次,需要确保GPU设备的CUDA计算能力(CUDA Compute Capability)大于7.0,典型的设备包括V100、T4、A10、A100、GTX 20系列和30系列显卡等。更多关于CUDA Compute Capability和精度支持情况请参考NVIDIA文档:[GPU硬件与支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)。
* `use_fast`: 使用C++实现的高性能分词算子FastTokenizer进行文本预处理加速。需要通过`pip install fast-tokenizer-python`安装FastTokenizer库后方可使用。默认为`False`。更多使用说明可参考[FastTokenizer文档](../../fast_tokenizer)。
4 changes: 1 addition & 3 deletions applications/information_extraction/taskflow_text_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,8 +299,7 @@ UIE does not limit industry fields and extraction targets. The following are som
batch_size=16,
model='uie-base',
position_prob=0.5,
precision='fp32',
use_fast=False)
precision='fp32')
```

* `schema`: Define the task extraction target, which can be configured by referring to the calling examples of different tasks in the out-of-the-box.
Expand All @@ -309,4 +308,3 @@ UIE does not limit industry fields and extraction targets. The following are som
* `model`: select the model used by the task, the default is `uie-base`, optional `uie-base`, `uie-medium`, `uie-mini`, `uie-micro`, `uie-nano` and `uie-medical-base`, `uie-base-en`, `uie-x-base`.
* `position_prob`: The result probability of the model for the start position/end position of the span is between 0 and 1, and the returned result removes the results less than this threshold, the default is 0.5, and the final probability output of the span is the start position probability and end position The product of the position probabilities.
* `precision`: select the model precision, the default is `fp32`, optional `fp16` and `fp32`. `fp16` inference is faster, support GPU and NPU hardware. If you choose `fp16` and GPU hardware, please ensure that the machine is correctly installed with NVIDIA-related drivers and basic software. **Ensure that CUDA>=11.2, cuDNN>=8.1.1**. For the first time use, you need to follow the prompts to install the relevant dependencies. Secondly, it is necessary to ensure that the CUDA Compute Capability of the GPU device is greater than 7.0. Typical devices include V100, T4, A10, A100, GTX 20 series and 30 series graphics cards, etc. For more information about CUDA Compute Capability and precision support, please refer to NVIDIA documentation: [GPU Hardware and Supported Precision Comparison Table](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix).
* `use_fast`: Use the high-performance word segmentation operator FastTokenizer implemented in C++ to accelerate text preprocessing. The FastTokenizer library needs to be installed through `pip install fast-tokenizer-python` before it can be used. Defaults to `False`. For more usage instructions, please refer to [FastTokenizer Documentation](../../fast_tokenizer).
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,6 @@ pip install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsingh
- 默认开启国内清华镜像源来加速下载,如果您使用 HTTP 代理可以关闭(-i https://pypi.tuna.tsinghua.edu.cn/simple)
- 更多wheel包请参考[serving官网文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Latest_Packages_CN.md)

### 安装FastTokenizer文本处理加速库(可选)
> 重要提示:由于FastTokenizer长时间未得到维护,因此可能会遇到训练(基于Python实现的tokenizer)与部署(基于C++实现的tokenizer)阶段分词不一致的问题。为了确保稳定性和一致性,我们建议避免安装该库。

如果想要安装fast_tokenizer,以获得更高的文本处理效率,从而显著提升服务性能。您可以通过以下命令进行安装:
```shell
pip install fast-tokenizer-python
```


## 模型转换

Expand Down Expand Up @@ -149,7 +141,6 @@ I0727 06:50:34.988327 43126 analysis_predictor.cc:1007] ======= optimize end ===
I0727 06:50:34.992336 43126 naive_executor.cc:102] --- skip [feed], feed -> token_type_ids
I0727 06:50:34.992357 43126 naive_executor.cc:102] --- skip [feed], feed -> input_ids
I0727 06:50:34.993671 43126 naive_executor.cc:102] --- skip [linear_75.tmp_1], fetch -> fetch
[2022-07-27 06:50:35,954] [ WARNING] - Can't find the fast_tokenizer package, please ensure install fast_tokenizer correctly. You can install fast_tokenizer by `pip install fast-tokenizer-python`.
[2022-07-27 06:50:35,954] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'ernie-3.0-medium-zh'.
[2022-07-27 06:50:35,954] [ INFO] - Already cached /root/.paddlenlp/models/ernie-3.0-medium-zh/ernie_3.0_medium_zh_vocab.txt
[OP Object] init success
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,6 @@ python -m pip install onnxruntime-gpu onnx onnxconverter-common==1.9.0 psutil pa
python -m pip install onnxruntime psutil
```

安装FastTokenizer文本处理加速库(可选)
> 重要提示:由于FastTokenizer长时间未得到维护,因此可能会遇到训练(基于Python实现的tokenizer)与部署(基于C++实现的tokenizer)阶段分词不一致的问题。为了确保稳定性和一致性,我们建议避免安装该库。

如果想要安装fast_tokenizer,以获得更高的文本处理效率,从而显著提升服务性能。您可以通过以下命令进行安装:
```shell
pip install fast-tokenizer-python
```

## 基于GPU部署推理样例
请使用如下命令进行部署
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,16 +48,6 @@ python3 -m pip install paddlepaddle-gpu paddlenlp -i https://mirror.baidu.com/py
3. 更多关于PaddleNLP安装的详细教程请查看[Installation](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/get_started/installation.rst)。


### 安装FastTokenizer文本处理加速库(可选)

> 重要提示:由于FastTokenizer长时间未得到维护,因此可能会遇到训练(基于Python实现的tokenizer)与部署(基于C++实现的tokenizer)阶段分词不一致的问题。为了确保稳定性和一致性,我们建议避免安装该库。

如果想要安装fast_tokenizer,以获得更高的文本处理效率,从而显著提升服务性能。您可以通过以下命令进行安装:
```shell
python3 -m pip install fast-tokenizer-python
```


## 模型获取和转换

使用Triton做服务化部署时,选择ONNX Runtime后端运行需要先将模型转换成ONNX格式。
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,16 +48,6 @@ python3 -m pip install paddlepaddle-gpu paddlenlp -i https://mirror.baidu.com/py
3. 更多关于PaddleNLP安装的详细教程请查看[Installation](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/get_started/installation.rst)。


### 安装FastTokenizer文本处理加速库(可选)

> 重要提示:由于FastTokenizer长时间未得到维护,因此可能会遇到训练(基于Python实现的tokenizer)与部署(基于C++实现的tokenizer)阶段分词不一致的问题。为了确保稳定性和一致性,我们建议避免安装该库。

如果想要安装fast_tokenizer,以获得更高的文本处理效率,从而显著提升服务性能。您可以通过以下命令进行安装:
```shell
python3 -m pip install fast-tokenizer-python
```


## 模型获取和转换

使用Triton做服务化部署时,选择ONNX Runtime后端运行需要先将模型转换成ONNX格式。使用Paddle2ONNX将Paddle静态图模型转换为ONNX模型格式的命令如下,以下命令成功运行后,将会在当前目录下生成model.onnx模型文件。
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,6 @@ pip install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsingh
- 默认开启国内清华镜像源来加速下载,如果您使用 HTTP 代理可以关闭(-i https://pypi.tuna.tsinghua.edu.cn/simple)
- 更多wheel包请参考[serving官网文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Latest_Packages_CN.md)

### 安装FastTokenizer文本处理加速库(可选)
> 重要提示:由于FastTokenizer长时间未得到维护,因此可能会遇到训练(基于Python实现的tokenizer)与部署(基于C++实现的tokenizer)阶段分词不一致的问题。为了确保稳定性和一致性,我们建议避免安装该库。

如果想要安装fast_tokenizer,以获得更高的文本处理效率,从而显著提升服务性能。您可以通过以下命令进行安装:
```shell
pip install fast-tokenizer-python
```


## 模型转换

Expand Down Expand Up @@ -144,7 +136,6 @@ I0625 16:44:36.563802 40218 analysis_predictor.cc:1007] ======= optimize end ===
I0625 16:44:36.571702 40218 naive_executor.cc:102] --- skip [feed], feed -> token_type_ids
I0625 16:44:36.571728 40218 naive_executor.cc:102] --- skip [feed], feed -> input_ids
I0625 16:44:36.574352 40218 naive_executor.cc:102] --- skip [linear_147.tmp_1], fetch -> fetch
[2022-06-25 16:44:37,545] [ WARNING] - Can't find the fast_tokenizer package, please ensure install fast_tokenizer correctly. You can install fast_tokenizer by `pip install fast-tokenizer-python`.
[2022-06-25 16:44:37,546] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'ernie-3.0-medium-zh'.
[2022-06-25 16:44:37,546] [ INFO] - Already cached /root/.paddlenlp/models/ernie-3.0-medium-zh/ernie_3.0_base_zh_vocab.txt
[OP Object] init success
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,6 @@ python -m pip install onnxruntime-gpu onnx onnxconverter-common==1.9.0 paddle2on
python -m pip install onnxruntime
```

安装FastTokenizer文本处理加速库(可选)
> 重要提示:由于FastTokenizer长时间未得到维护,因此可能会遇到训练(基于Python实现的tokenizer)与部署(基于C++实现的tokenizer)阶段分词不一致的问题。为了确保稳定性和一致性,我们建议避免安装该库。

如果想要安装fast_tokenizer,以获得更高的文本处理效率,从而显著提升服务性能。您可以通过以下命令进行安装:
```shell
pip install fast-tokenizer-python
```
## 基于GPU部署推理样例

请使用如下命令进行部署
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,16 +48,6 @@ python3 -m pip install paddlepaddle-gpu paddlenlp -i https://mirror.baidu.com/py
3. 更多关于PaddleNLP安装的详细教程请查看[Installation](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/get_started/installation.rst)。


### 安装FastTokenizer文本处理加速库(可选)

> 重要提示:由于FastTokenizer长时间未得到维护,因此可能会遇到训练(基于Python实现的tokenizer)与部署(基于C++实现的tokenizer)阶段分词不一致的问题。为了确保稳定性和一致性,我们建议避免安装该库。

如果想要安装fast_tokenizer,以获得更高的文本处理效率,从而显著提升服务性能。您可以通过以下命令进行安装:
```shell
python3 -m pip install fast-tokenizer-python
```


## 模型获取和转换

使用Triton做服务化部署时,选择ONNX Runtime后端运行需要先将模型转换成ONNX格式。
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,6 @@

本目录下提供 `infer.py` 快速完成在 CPU/GPU 的通用文本分类任务的 Python 部署示例。

## 依赖安装

直接执行以下命令安装部署示例的依赖。

```bash
# 安装 fast_tokenizer 以及 GPU 版本 fastdeploy
pip install fast-tokenizer-python fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```

## 快速开始

Expand Down
Loading
Loading