Skip to content

Commit

Permalink
change readme
Browse files Browse the repository at this point in the history
  • Loading branch information
nlpzhezhao committed Mar 9, 2024
1 parent 69f4413 commit 4d31484
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 15 deletions.
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

<img src="logo.jpg" width="390" hegiht="390" align=left />

Pre-training has become an essential part for NLP tasks. UER-py (Universal Encoder Representations) is a toolkit for pre-training on general-domain corpus and fine-tuning on downstream task. UER-py maintains model modularity and supports research extensibility. It facilitates the use of existing pre-training models, and provides interfaces for users to further extend upon. With UER-py, we build a model zoo which contains pre-trained models of different properties. **See the Wiki for [Full Documentation](https://github.com/dbiir/UER-py/wiki)**.
Pre-training has become an essential part for NLP tasks. UER-py (Universal Encoder Representations) is a toolkit for pre-training on general-domain corpus and fine-tuning on downstream task. UER-py maintains model modularity and supports research extensibility. It facilitates the use of existing pre-training models, and provides interfaces for users to further extend upon. With UER-py, we build a model zoo which contains pre-trained models of different properties. **See the [UER-py project Wiki](https://github.com/dbiir/UER-py/wiki) for full documentation**.

<br/>
<br/>
Expand Down Expand Up @@ -160,19 +160,20 @@ UER-py is organized as follows:
```
UER-py/
|--uer/
| |--embeddings/ # contains embeddings
| |--encoders/ # contains encoders such as RNN, CNN,
| |--decoders/ # contains decoders
| |--targets/ # contains targets such as language modeling, masked language modeling
| |--layers/ # contains frequently-used NN layers, such as embedding layer, normalization layer
| |--models/ # contains model.py, which combines embedding, encoder, and target modules
| |--embeddings/ # contains modules of embedding component
| |--encoders/ # contains modules of encoder component such as RNN, CNN, Transformer
| |--decoders/ # contains modules of decoder component
| |--targets/ # contains modules of target component such as language modeling, masked language modeling
| |--layers/ # contains frequently-used NN layers
| |--models/ # contains model.py, which combines modules of different components
| |--utils/ # contains frequently-used utilities
| |--model_builder.py
| |--model_loader.py
| |--model_saver.py
| |--opts.py
| |--trainer.py
|
|--corpora/ # contains corpora for pre-training
|--corpora/ # contains pre-training data
|--datasets/ # contains downstream tasks
|--models/ # contains pre-trained models, vocabularies, and configuration files
|--scripts/ # contains useful scripts for pre-training models
Expand All @@ -184,7 +185,7 @@ UER-py/
|--README.md
|--README_ZH.md
|--requirements.txt
|--logo.jpg
|--LICENSE
```

Expand Down Expand Up @@ -214,7 +215,7 @@ UER-py has been used in winning solutions of many NLP competitions. In this sect
<br/>

## Contact information
For communication related to this project, please contact Zhe Zhao (helloworld@ruc.edu.cn; nlpzhezhao@tencent.com) or Yudong Li (liyudong123@hotmail.com) or Cheng Hou (chenghoubupt@bupt.edu.cn) or Wenhang Shi (wenhangshi@ruc.edu.cn).
For communication related to this project, please contact Zhe Zhao (helloworld@alu.ruc.edu.cn; nlpzhezhao@tencent.com) or Yudong Li (liyudong123@hotmail.com) or Cheng Hou (chenghoubupt@bupt.edu.cn) or Wenhang Shi (wenhangshi@ruc.edu.cn).

This work is instructed by my enterprise mentors __Qi Ju__, __Xuefeng Yang__, __Haotang Deng__ and school mentors __Tao Liu__, __Xiaoyong Du__.

Expand Down
10 changes: 5 additions & 5 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

<img src="logo.jpg" width="390" hegiht="390" align=left />

预训练已经成为自然语言处理任务的重要组成部分,为大量自然语言处理任务带来了显著提升。UER-py(Universal Encoder Representations)是一个用于对通用语料进行预训练并对下游任务进行微调的工具包。UER-py遵循模块化的设计原则。通过模块的组合,用户能迅速精准的复现已有的预训练模型,并利用已有的接口进一步开发更多的预训练模型。通过UER-py,我们建立了一个模型仓库,其中包含不同性质的预训练模型(例如基于不同编码器和目标任务)。用户可以根据具体任务的要求,从中选择合适的预训练模型使用。**[完整文档](https://github.com/dbiir/UER-py/wiki/主页)请参见本项目Wiki**
预训练已经成为自然语言处理任务的重要组成部分,为大量自然语言处理任务带来了显著提升。UER-py(Universal Encoder Representations)是一个用于对通用语料进行预训练并对下游任务进行微调的工具包。UER-py遵循模块化的设计原则。通过模块的组合,用户能迅速精准的复现已有的预训练模型,并利用已有的接口进一步开发更多的预训练模型。通过UER-py,我们建立了一个模型仓库,其中包含不同性质的预训练模型(例如基于不同语料、编码器、目标任务)。用户可以根据具体任务的要求,从中选择合适的预训练模型使用。**完整文档请参见[本项目Wiki]((https://github.com/dbiir/UER-py/wiki/主页))**


<br>
Expand Down Expand Up @@ -36,10 +36,10 @@
UER-py有如下几方面优势:
- __可复现__ UER-py已在许多数据集上进行了测试,与原始预训练模型实现(例如BERT、GPT-2、ELMo、T5)的表现相匹配
- __模块化__ UER-py使用解耦的模块化设计框架。框架分成Embedding、Encoder、Target等多个部分。各个部分之间有着清晰的接口并且每个部分包括了丰富的模块。可以对不同模块进行组合,构建出性质不同的预训练模型
- __模型训练__ UER-py支持CPU、单机单GPU、单机多GPU、多机多GPU训练模式
- __模型仓库__ 我们维护并持续发布预训练模型。用户可以根据具体任务的要求,从中选择合适的预训练模型使用
- __模型训练__ UER-py支持单机CPU、单机GPU、多机多GPU训练模式
- __模型仓库__ 我们维护并发布预训练模型。用户可以根据具体任务的要求,从中选择合适的预训练模型使用
- __SOTA结果__ UER-py支持全面的下游任务,包括文本分类、文本对分类、序列标注、阅读理解等,并提供了多个竞赛获胜解决方案
- __预训练相关功能__ UER-py提供了丰富的预训练相关的功能和优化,包括特征抽取、近义词检索、预训练模型转换、模型集成、文本生成等
- __预训练相关功能__ UER-py提供了丰富的预训练相关的功能,包括特征抽取、近义词检索、预训练模型转换、模型集成、文本生成等


<br/>
Expand Down Expand Up @@ -75,7 +75,7 @@ doc2-sent1
doc3-sent1
doc3-sent2
```
书评语料是由书评分类数据集去掉标签得到的。我们将一条评论从中间分开,从而形成一个两句话的文档,具体可见*corpora*文件夹中的*book_review_bert.txt*
书评语料是由书评情感分类数据集去掉标签得到的。我们将一条评论从中间分开,从而形成一个两句话的文档,具体可见*corpora*文件夹中的*book_review_bert.txt*

分类数据集的格式如下:
```
Expand Down

0 comments on commit 4d31484

Please sign in to comment.