Merge branch 'develop' of https://github.com/lugimzzz/PaddleNLP into …

…peft
lugimzzz · Jan 2, 2024 · 58febcc · 58febcc
2 parents 32cb533 + 1982091
commit 58febcc
Show file tree

Hide file tree

Showing 59 changed files with 1,701 additions and 2,424 deletions.
diff --git a/README.md b/README.md
@@ -285,7 +285,7 @@ PaddleNLP针对信息抽取、语义检索、智能问答、情感分析等高
 AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)
 ```
 
-为了实现更极致的模型部署性能，安装FastTokenizers后只需在`AutoTokenizer` API上打开 `use_fast=True`选项，即可调用C++实现的高性能分词算子，轻松获得超Python百余倍的文本处理加速，更多使用说明可参考[FastTokenizer文档](./fast_tokenizer)。
+为了实现更极致的模型部署性能，安装FastTokenizer后只需在`AutoTokenizer` API上打开 `use_fast=True`选项，即可调用C++实现的高性能分词算子，轻松获得超Python百余倍的文本处理加速，更多使用说明可参考[FastTokenizer文档](./fast_tokenizer)。
 
 #### ⚡️ FastGeneration：高性能生成加速库
 

diff --git a/applications/text_classification/hierarchical/deploy/paddle_serving/README.md b/applications/text_classification/hierarchical/deploy/paddle_serving/README.md
@@ -55,7 +55,9 @@ pip install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsingh
 - 更多wheel包请参考[serving官网文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Latest_Packages_CN.md)
 
 ### 安装FastTokenizer文本处理加速库（可选）
-推荐安装fast_tokenizer可以得到更极致的文本处理效率，进一步提升服务性能。
+> 重要提示：由于FastTokenizer长时间未得到维护，因此可能会遇到训练（基于Python实现的tokenizer）与部署（基于C++实现的tokenizer）阶段分词不一致的问题。为了确保稳定性和一致性，我们建议避免安装该库。
+
+如果想要安装fast_tokenizer，以获得更高的文本处理效率，从而显著提升服务性能。您可以通过以下命令进行安装：
 ```shell
 pip install fast-tokenizer-python
 ```

diff --git a/applications/text_classification/hierarchical/deploy/predictor/README.md b/applications/text_classification/hierarchical/deploy/predictor/README.md
@@ -20,7 +20,9 @@ python -m pip install onnxruntime psutil
 ```
 
 安装FastTokenizer文本处理加速库（可选）
-推荐安装fast_tokenizer可以得到更极致的文本处理效率，进一步提升服务性能。
+> 重要提示：由于FastTokenizer长时间未得到维护，因此可能会遇到训练（基于Python实现的tokenizer）与部署（基于C++实现的tokenizer）阶段分词不一致的问题。为了确保稳定性和一致性，我们建议避免安装该库。
+
+如果想要安装fast_tokenizer，以获得更高的文本处理效率，从而显著提升服务性能。您可以通过以下命令进行安装：
 ```shell
 pip install fast-tokenizer-python
 ```

diff --git a/applications/text_classification/hierarchical/deploy/triton_serving/README.md b/applications/text_classification/hierarchical/deploy/triton_serving/README.md
@@ -48,11 +48,11 @@ python3 -m pip install paddlepaddle-gpu paddlenlp -i https://mirror.baidu.com/py
 3. 更多关于PaddleNLP安装的详细教程请查看[Installation](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/get_started/installation.rst)。
 
 
-### 安装FastTokenizers文本处理加速库（可选）
+### 安装FastTokenizer文本处理加速库（可选）
 
-推荐安装fast_tokenizer可以得到更极致的文本处理效率，进一步提升服务性能。
+> 重要提示：由于FastTokenizer长时间未得到维护，因此可能会遇到训练（基于Python实现的tokenizer）与部署（基于C++实现的tokenizer）阶段分词不一致的问题。为了确保稳定性和一致性，我们建议避免安装该库。
 
-在容器内安装 fast_tokenizer
+如果想要安装fast_tokenizer，以获得更高的文本处理效率，从而显著提升服务性能。您可以通过以下命令进行安装：
 ```shell
 python3 -m pip install fast-tokenizer-python
 ```

diff --git a/applications/text_classification/multi_class/deploy/triton_serving/README.md b/applications/text_classification/multi_class/deploy/triton_serving/README.md
@@ -50,9 +50,9 @@ python3 -m pip install paddlepaddle-gpu paddlenlp -i https://mirror.baidu.com/py
 
 ### 安装FastTokenizer文本处理加速库（可选）
 
-部署环境是Linux，推荐安装fast_tokenizer可以得到更极致的文本处理效率，进一步提升服务性能。
+> 重要提示：由于FastTokenizer长时间未得到维护，因此可能会遇到训练（基于Python实现的tokenizer）与部署（基于C++实现的tokenizer）阶段分词不一致的问题。为了确保稳定性和一致性，我们建议避免安装该库。
 
-在容器内安装 fast_tokenizer
+如果想要安装fast_tokenizer，以获得更高的文本处理效率，从而显著提升服务性能。您可以通过以下命令进行安装：
 ```shell
 python3 -m pip install fast-tokenizer-python
 ```

diff --git a/applications/text_classification/multi_label/deploy/paddle_serving/README.md b/applications/text_classification/multi_label/deploy/paddle_serving/README.md
@@ -52,7 +52,9 @@ pip install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsingh
 - 更多wheel包请参考[serving官网文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Latest_Packages_CN.md)
 
 ### 安装FastTokenizer文本处理加速库（可选）
-推荐安装fast_tokenizer可以得到更极致的文本处理效率，进一步提升服务性能。
+> 重要提示：由于FastTokenizer长时间未得到维护，因此可能会遇到训练（基于Python实现的tokenizer）与部署（基于C++实现的tokenizer）阶段分词不一致的问题。为了确保稳定性和一致性，我们建议避免安装该库。
+
+如果想要安装fast_tokenizer，以获得更高的文本处理效率，从而显著提升服务性能。您可以通过以下命令进行安装：
 ```shell
 pip install fast-tokenizer-python
 ```

diff --git a/applications/text_classification/multi_label/deploy/predictor/README.md b/applications/text_classification/multi_label/deploy/predictor/README.md
@@ -21,7 +21,9 @@ python -m pip install onnxruntime
 ```
 
 安装FastTokenizer文本处理加速库（可选）
-推荐安装fast_tokenizer可以得到更极致的文本处理效率，进一步提升服务性能。
+> 重要提示：由于FastTokenizer长时间未得到维护，因此可能会遇到训练（基于Python实现的tokenizer）与部署（基于C++实现的tokenizer）阶段分词不一致的问题。为了确保稳定性和一致性，我们建议避免安装该库。
+
+如果想要安装fast_tokenizer，以获得更高的文本处理效率，从而显著提升服务性能。您可以通过以下命令进行安装：
 ```shell
 pip install fast-tokenizer-python
 ```

diff --git a/applications/text_classification/multi_label/deploy/triton_serving/README.md b/applications/text_classification/multi_label/deploy/triton_serving/README.md
@@ -50,9 +50,9 @@ python3 -m pip install paddlepaddle-gpu paddlenlp -i https://mirror.baidu.com/py
 
 ### 安装FastTokenizer文本处理加速库（可选）
 
-推荐安装fast_tokenizer可以得到更极致的文本处理效率，进一步提升服务性能。
+> 重要提示：由于FastTokenizer长时间未得到维护，因此可能会遇到训练（基于Python实现的tokenizer）与部署（基于C++实现的tokenizer）阶段分词不一致的问题。为了确保稳定性和一致性，我们建议避免安装该库。
 
-在容器内安装 fast_tokenizer
+如果想要安装fast_tokenizer，以获得更高的文本处理效率，从而显著提升服务性能。您可以通过以下命令进行安装：
 ```shell
 python3 -m pip install fast-tokenizer-python
 ```

diff --git a/csrc/generation/set_alibi_mask_value.cu b/csrc/generation/set_alibi_mask_value.cu
diff --git a/csrc/generation/set_mask_value.cu b/csrc/generation/set_mask_value.cu
diff --git a/csrc/setup_cuda.py b/csrc/setup_cuda.py
@@ -55,7 +55,6 @@ def get_gencode_flags():
     ext_modules=CUDAExtension(
         sources=[
             "./generation/save_with_output.cc",
-            "./generation/set_mask_value.cu",
             "./generation/set_value_by_flags.cu",
             "./generation/token_penalty_multi_scores.cu",
             "./generation/stop_generation_multi_ends.cu",
@@ -66,7 +65,6 @@ def get_gencode_flags():
             "./generation/transpose_removing_padding.cu",
             "./generation/write_cache_kv.cu",
             "./generation/encode_rotary_qk.cu",
-            "./generation/set_alibi_mask_value.cu",
             "./generation/quant_int8.cu",
             "./generation/dequant_int8.cu",
         ],

diff --git a/examples/language_model/gpt b/examples/language_model/gpt
diff --git a/examples/language_model/moe/dygraph/lr.py b/examples/language_model/moe/dygraph/lr.py