Skip to content

Commit

Permalink
Merge conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
SylarTiaNII committed Jan 3, 2023
2 parents 2f6c461 + 0295f0d commit 97dd36e
Show file tree
Hide file tree
Showing 433 changed files with 19,324 additions and 3,393 deletions.
12 changes: 10 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ lint:
test: unit-test

unit-test:
PYTHONPATH=$(shell pwd) pytest \
PYTHONPATH=$(shell pwd) pytest -x \
-n auto --cov paddlenlp \
--cov-report xml:coverage.xml

Expand Down Expand Up @@ -63,4 +63,12 @@ deploy-paddlenlp:
# build
python3 setup.py sdist bdist_wheel
# upload
twine upload --skip-existing dist/*
twine upload --skip-existing dist/*

.PHONY: regression-all
release:
bash ./scripts/regression/run_release.sh 0 0,1 all

.PHONY: regression-key
key:
bash ./scripts/regression/run_release.sh 0 0,1 p0
2 changes: 1 addition & 1 deletion README.md
35 changes: 11 additions & 24 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,27 +31,14 @@

## News 📢

* 🔥 **2022.12.9 发布 [PaddleNLP v2.4.5](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.5)**
* 📃 发布兼具文本及文档抽取能力、多语言、开放域信息抽取模型**UIE-X**,具有突出的零样本效果及小样本迁移能力。
* 🔨 产业应用:新增[**信息抽取全流程应用方案**](./applications/information_extraction),支持文本、文档各类信息抽取场景,提供从数据标注、微调到部署的产业级全流程解决方案。
* 🔥 **2022.11.28 发布 [PaddleNLP v2.4.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.4)**
* 💪 框架升级:新增[**Huggingface Hub集成**](https://huggingface.co/PaddlePaddle),后续将逐步支持Model,Tokenizer,Taskflow直接从[Huggingface Hub](https://huggingface.co/PaddlePaddle)加载;[**小样本 Prompt API**](./docs/advanced_guide/prompt.md)升级,支持PET算法实现。
* 💎 NLP工具:[**NLP 流水线系统 Pipelines**](./pipelines)检索能力再加强,新增交互式学习语义检索模型[Ernie-Search](./pipelines/API.md);发布[**SimpleServing**](./docs/server.md),支持Taskflow、预训练模型快速部署。
* 🔥 **2022.11.17 发布 [PaddleNLP v2.4.3](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.3)**
* 💪 框架升级:🏆 [**小样本 Prompt API**](./docs/advanced_guide/prompt.md) 升级,提示定义更加灵活,支撑 [FewCLUE AutoPrompt 方案](https://mp.weixin.qq.com/s/_JPiAzFA1f0BZ0igdv-EKA);🕸 [**Trainer API**](./docs/trainer.md) 升级,新增sharding、bf16训练,新增Seq2seqTrainer、IterableDataset支持。
* 🔨 产业应用:🏃[**通用信息抽取 UIE 能力升级**](./model_zoo/uie),支持量化训练及 INT8 精度推理,进一步提升 UIE 推理速度。
* 🔥 **2022.10.27 发布 [PaddleNLP v2.4.2](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.2)**
* NLG能力扩充:新增📄[**基于Pegasus的中文文本摘要方案**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_summarization/pegasus),效果领先;新增❓[**问题生成解决方案**](./examples/question_generation),提供基于业界领先模型UNIMO-Text和大规模多领域问题生成数据集训练的通用问题生成预训练模型。均支持Taskflow一键调用,支持FasterGeneration高性能推理,训练推理部署全流程打通。
* 发布 🖼[**PPDiffusers**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/ppdiffusers):支持跨模态(如图像与语音)训练和推理的扩散模型(Diffusion Model)工具箱,可快速体验、二次开发 **Stable Diffusion**,持续支持更多模型。

* 🔥 **2022.10.14 发布 [PaddleNLP v2.4.1](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.1)**
* 🧾 发布多语言跨模态布局增强文档智能大模型 [**ERNIE-Layout**](./model_zoo/ernie-layout/),刷新11项任务SOTA。同步发布基于ERNIE-Layout的**文档抽取问答模型DocPrompt** 🔖,精准理解文档图片布局与语义信息,轻松应对各类业务场景。

* 🔥 **2022.9.6 发布 [PaddleNLP v2.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.0)**
* 💎 NLP工具:**[NLP 流水线系统 Pipelines](./pipelines)** 发布,支持快速搭建搜索引擎、问答系统,可扩展支持各类NLP系统,让解决 NLP 任务像搭积木一样便捷、灵活、高效!
* 🔨 产业应用:新增 **[文本分类全流程应用方案](./applications/text_classification)** ,覆盖多分类、多标签、层次分类各类场景,支持 **小样本学习****TrustAI** 可信计算模型训练与调优;[**通用信息抽取 UIE 能力升级**](./model_zoo/uie),发布 **UIE-M**,支持中英文混合抽取,新增**UIE 数据蒸馏**方案,打破 UIE 推理瓶颈,推理速度提升 100 倍以上;
* 🍭 AIGC 内容生成:新增代码生成 SOTA 模型[**CodeGen**](./examples/code_generation/codegen),支持多种编程语言代码生成;集成[**文图生成潮流模型**](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90) DALL·E Mini、Disco Diffusion、Stable Diffusion,更多趣玩模型等你来玩;新增[**中文文本摘要应用**](./applications/text_summarization),基于大规模语料的中文摘要模型首次发布,可支持 Taskflow 一键调用和定制训练;
* 💪 框架升级:[**模型自动压缩 API**](./docs/compression.md) 发布,自动对模型进行裁减和量化,大幅降低模型压缩技术使用门槛;[**小样本 Prompt**](./applications/text_classification/multi_class/few-shot)能力发布,集成 PET、P-Tuning、RGL 等经典算法。
* 🔥 **近期新增**
* 📃 发布兼具文本及文档抽取能力、多语言、开放域信息抽取模型 UIE-X,小样本迁移能力强;新增[信息抽取全流程应用方案](./applications/information_extraction)
* ❣️发布[基于 UIE 的观点抽取与情感分析应用方案](./applications/sentiment_analysis/unified_sentiment_extraction),小样本能力强悍。支持句子级与属性级情感极性分类、属性抽取、观点抽取,解决属性聚合和隐性观点抽取难题。
* **2022.9.6 发布 [PaddleNLP v2.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.0)**
* 💎 NLP工具:[NLP 流水线系统 Pipelines](./pipelines) 发布,支持快速搭建搜索引擎、问答系统,可扩展支持各类NLP系统,让解决 NLP 任务像搭积木一样便捷、灵活、高效!
* 🔨 产业应用:新增 [文本分类全流程应用方案](./applications/text_classification) ,覆盖多分类、多标签、层次分类各类场景,支持小样本学习和 TrustAI 可信计算模型训练与调优。
* 🍭 AIGC :新增代码生成 SOTA 模型[CodeGen](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/examples/code_generation/codegen),支持多种编程语言代码生成;集成[文图生成潮流模型](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#文图生成) DALL·E Mini、Disco Diffusion、Stable Diffusion,更多趣玩模型等你来玩;
* 💪 框架升级:[模型自动压缩 API](./docs/compression.md) 发布,自动对模型进行裁减和量化,大幅降低模型压缩技术使用门槛;[小样本 Prompt](./applications/text_classification/multi_class/few-shot)能力发布,集成 PET、P-Tuning、RGL 等经典算法。


## 社区交流
Expand Down Expand Up @@ -257,7 +244,7 @@ AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)

为了实现更极致的模型部署性能,安装FastTokenizers后只需在`AutoTokenizer` API上打开 `use_fast=True`选项,即可调用C++实现的高性能分词算子,轻松获得超Python百余倍的文本处理加速,更多使用说明可参考[FastTokenizer文档](./fast_tokenizer)

#### ⚡️ FasterGeneration:高性能生成加速库
#### ⚡️ FastGeneration:高性能生成加速库

<div align="center">
<img src="https://user-images.githubusercontent.com/11793384/168407831-914dced0-3a5a-40b8-8a65-ec82bf13e53c.gif" width="400">
Expand All @@ -268,10 +255,10 @@ model = GPTLMHeadModel.from_pretrained('gpt-cpm-large-cn')
...
outputs, _ = model.generate(
input_ids=inputs_ids, max_length=10, decode_strategy='greedy_search',
use_faster=True)
use_fast=True)
```

简单地在`generate()`API上打开`use_faster=True`选项,轻松在Transformer、GPT、BART、PLATO、UniLM等生成式预训练模型上获得5倍以上GPU加速,更多使用说明可参考[FasterGeneration文档](./faster_generation)
简单地在`generate()`API上打开`use_fast=True`选项,轻松在Transformer、GPT、BART、PLATO、UniLM等生成式预训练模型上获得5倍以上GPU加速,更多使用说明可参考[FastGeneration文档](./fast_generation)

#### 🚀 Fleet:飞桨4D混合并行分布式训练技术

Expand Down
36 changes: 13 additions & 23 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,32 +30,22 @@

## News 📢

* 🔥 **2022.12.9 [PaddleNLP v2.4.5](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.5)**
* 📃 Release **UIE-X**, an universal information extraction model which supports both document and text inputs.
* 🔨 Industrial application: Release [**Complete Solution of Information Extraction**](./applications/information_extraction), supports most extraction tasks, and we provide a comprehensive and easy-to-use fine-tuning customization workflow。
* 🔥 **2022.11.28 [PaddleNLP v2.4.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.4)**
* 💪 Framework upgrade: Introduced [**Huggingface Hub Integration**](https://huggingface.co/PaddlePaddle) with a plan to gradually support all Models, Tokenizers and Taskflows to directly load from [Huggingface Hub](https://huggingface.co/PaddlePaddle); Added PET implementation to [**Prompt API**](./docs/advanced_guide/prompt.md).
* 💎 NLP Tool: [**Pipelines**](./pipelines) now supports Cross-Encoder [Ernie-Search](./pipelines/API.md) for Semantic Search; Released [**SimpleServing**](./docs/server.md), a quick out-of-box solution to deploy Taskflows and Pretrained Models.
* 🔥 **2022.11.17 [PaddleNLP v2.4.3](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.3) Released!**
* 💪 Framework upgrade: 🏆 Upgrade [**Prompt API**](./docs/advanced_guide/prompt.md), supporting more flexible prompt definitions and winning the 1st place in [FewCLUE](https://mp.weixin.qq.com/s/_JPiAzFA1f0BZ0igdv-EKA); 🕸 Upgrade [**Trainer API**](./docs/trainer.md), supporting Seq2seqTrainer, IterableDataset as well as bf16 and sharding strategies.
* 🔨 Industrial application: 🏃 Upgrade for [**Universal Information Extraction**](./model_zoo/uie). Support **quantization aware training** and INT8 precision inference for inference performance boost.
* 🔥 **2022.10.27 [PaddleNLP v2.4.2](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.2) Released!**
* NLG Upgrade: 📄 Release [**Solution of Text Summarization**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_summarization/pegasus) based on Pegasus;❓ Release [**Solution of Problem Generation**](./examples/question_generation), providing **general problem generation pre-trained model** based on Baidu's UNIMO Text and large-scale multi domain problem generation dataset. Supporting high-performance inference ability based on FasterGeneration , and covering the whole process of training , inference and deployment.
* 🔥 **2022.10.14 [PaddleNLP v2.4.1](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.1) Released!**
* 🧾 Release multilingual/cross-lingual pre-trained models [**ERNIE-Layout**](./model_zoo/ernie-layout/) which achieves new SOTA results in 11 downstream tasks. **DocPrompt** 🔖 based on ERNIE-Layout is also released which has the ability for multilingual document information extraction and question ansering.
* 🔥 **2022.9.6 [PaddleNLPv2.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.0) Released!**

* 💎 NLP Tool: **[Pipelines](./pipelines)** released. Supports for fast construction of search engine and question answering systems, and it is expandable to all kinds of NLP systems. Building end-to-end pipelines for NLP tasks like playing Lego!

* 🔨 Industrial application: Release **[Complete Solution of Text Classification](./applications/text_classification)** covering various scenarios of text classification: multi-class, multi-label and hierarchical, it also supports for **few-shot learning** and the training and optimization of **TrustAI**. Upgrade for [**Universal Information Extraction**](./model_zoo/uie) and release **UIE-M**, support both Chinese and English information extraction in a single model; release the data distillation solution for UIE to break the bottleneck of time-consuming of inference.

* 🍭 AIGC: Release code generation SOTA model [**CodeGen**](./examples/code_generation/codegen), supports for multiple programming languages code generation. Integrate [**Text to Image Model**](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90) DALL·E Mini, Disco Diffusion, Stable Diffusion, let's play and have some fun! Release [**Chinese Text Summarization Application**](./applications/text_summarization), first release of chinese text summarization model pretrained on a large scale of corpus, it can be use via Taskflow API and support for finetuning on your own data.
* 🔥 **Latest Features**
* 📃 Release **[UIE-X](./applications/information_extraction)**, an universal information extraction model that supports both document and text inputs.
* ❣️Release **[Opinion Mining and Sentiment Analysis Models](./applications/sentiment_analysis/unified_sentiment_extraction)** based on UIE, including abilities of sentence-level and aspect-based sentiment classification, attribute extraction, opinion extraction, attribute aggregation and implicit opinion extraction.
* **2022.9.6 [PaddleNLPv2.4](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.4.0) Released!**
* 💎 NLP Tools: Released **[Pipelines](./pipelines)** which supports turn-key construction of search engine and question answering systems. It features a flexible design that is applicable for all kinds of NLP systems so you can build end-to-end NLP pipelines like Legos!

* 🔨 Industrial application: Release **[Complete Solution of Text Classification](./applications/text_classification)** covering various scenarios of text classification: multi-class, multi-label and hierarchical, it also supports **few-shot learning** and the training and optimization of **TrustAI**. Upgrade for [**UIE**](./model_zoo/uie) and release **UIE-M**, support both Chinese and English information extraction in a single model; release the data distillation solution for UIE to break the bottleneck of time-consuming of inference.

* 🍭 AIGC: Release code generation SOTA model [**CodeGen**](./examples/code_generation/codegen) that supports multiple programming languages code generation. Integrate [**Text to Image Model**](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90) DALL·E Mini, Disco Diffusion, Stable Diffusion, let's play and have some fun!

* 💪 Framework upgrade: Release [**Auto Model Compression API**](./docs/compression.md), supports for pruning and quantization automatically, lower the barriers of model compression; Release [**Few-shot Prompt**](./applications/text_classification/multi_class/few-shot), includes the algorithms such as PET, P-Tuning and RGL.





## Features

#### <a href=#out-of-box-nlp-toolset> 📦 Out-of-Box NLP Toolset </a>
Expand Down Expand Up @@ -242,7 +232,7 @@ AutoTokenizer.from_pretrained("ernie-3.0-medium-zh", use_fast=True)

Set `use_fast=True` to use C++ Tokenizer kernel to achieve 100x faster on text pre-processing. For more usage please refer to [FastTokenizer](./fast_tokenizer).

#### FasterGeneration: High Perforance Generation Library
#### FastGeneration: High Perforance Generation Library

<div align="center">
<img src="https://user-images.githubusercontent.com/11793384/168407831-914dced0-3a5a-40b8-8a65-ec82bf13e53c.gif" width="400">
Expand All @@ -253,10 +243,10 @@ model = GPTLMHeadModel.from_pretrained('gpt-cpm-large-cn')
...
outputs, _ = model.generate(
input_ids=inputs_ids, max_length=10, decode_strategy='greedy_search',
use_faster=True)
use_fast=True)
```

Set `use_faster=True` to achieve 5x speedup for Transformer, GPT, BART, PLATO, UniLM text generation. For more usage please refer to [FasterGeneration](./faster_generation).
Set `use_fast=True` to achieve 5x speedup for Transformer, GPT, BART, PLATO, UniLM text generation. For more usage please refer to [FastGeneration](./fast_generation).

#### 🚀 Fleet: 4D Hybrid Distributed Training

Expand Down
2 changes: 2 additions & 0 deletions applications/information_extraction/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
简体中文 | [English](README_en.md)

# 信息抽取应用

**目录**
Expand Down
Loading

0 comments on commit 97dd36e

Please sign in to comment.