Compression API supports distributed training #3361

LiuChiachi · 2022-09-23T10:13:11Z

PR types

New features

PR changes

APIs

Description

Compression API supports distributed training (DynaBERT)
微信群用户用到了分布式裁剪，在本PR进行了支持。

lugimzzz

LGTM

* fix multi-layer-inherit * update bert model unittest * update requirements.txt * update ernie modeling test * update roberta unittest * update roformer modeling testing * complete ernie label loss * complete ernie/roberta/roformer unittest * update label/loss * update refactor code * remove unrelated requirements * add license * Update setup.py and README Examples (#3208) * Move token_num fetch out of train cycle (#3089) * Add finance course (#3207) * add finance course group code Co-authored-by: tianxin <tianxin04@baidu.com> * Update README_cn.md (#3212) add v2.4 features description. * Update README.md (#3209) Improve and fix the text content of case 1. Co-authored-by: tianxin <tianxin04@baidu.com> * [Recompute] Update recompute for hybrid parallel interface. (#3211) Co-authored-by: Zhong Hui <zhonghui.net@gmail.com> * Update README_cn.md * [ModelingOutput]update roformer unittest (#3159) * add roformer unittest * add roformer unittest * update test_modeling * use relative import * reduce model config to accelerate testing * remove input_embedding from pretrained model * revert slow tag * update local branch * update get_vocab method * update get_vocab method * update test_chinese method * change absolute import * update unittest * update chinese test case * add roformer more output testing Co-authored-by: Guo Sheng <guosheng@baidu.com> Co-authored-by: liu zhengxi <380185688@qq.com> * Update README_cn.md * Fix windows dtype bug of neural search (#3182) * Fix windows dtype bug of neural search * Fix windows dtype bug of neural search Co-authored-by: 吴高升 <w5688414@gmail.com> * Update README_cn.md * Update README_cn.md * Update README_cn.md * [ModelingOutput]add more output for skep model (#3146) * update return_dict/label in skep model * complete skep add-more-output * refactor simple code Co-authored-by: Zhong Hui <zhonghui.net@gmail.com> Co-authored-by: Guo Sheng <guosheng@baidu.com> Co-authored-by: liu zhengxi <380185688@qq.com> * remove model_config_file and resource_files_names * Update README_cn.md (#3219) * Remove boost library. (#3215) * Remove boost library. * add conditional include for gtest * Add test, demo exclude * Update bos url for UIE (#3222) * Update bos url * Update README.md * Update README.md * 源码安装htbuilder,避免windows安装失败 (#3221) Co-authored-by: 吴高升 <w5688414@gmail.com> * not default to gpu (#3218) * Update codegen params and doc (#3228) * update decoding * update doc * update three models * [Unittest]add roformerv2 unittest (#2994) * add roformerv2 unittest * update roformer-v2 testing * update config to accelerate testing * remove comment Co-authored-by: Guo Sheng <guosheng@baidu.com> * Optimize text classification deploy (#3217) * optimize_deploy * optimize_deploy * update_readme * fix data distill for UIE (#3231) * fix data distill * update * add evaluate_teacher * [Pre-Training] ERNIE-CW pre-training tasks docs. (#3111) * add ernie-large config * update * update clue finetune. * unused delete. * update * support no nsp for enrie. * fix evaluation * fix amp o2 save_dtype bugs. * extand ernie. * fix ernie pretrain with ## vocab. * extend vocab * support custom tokenizer. * add some comments. * fix bugs. * add comments. * fix bug. * fix run_pretrain_static logging. * fix all gather. * fix a100 * fix * fix bugs * fix save * tmp commit for pre-process. * Update README.md * Update README.md * add amp o1 support * ernie cw readme. * fix * throw error when dataset is invalid. * update document. * refine readme. * fix * refactor * refator2 * Add pre-training introduction. * update image width. * refine doc * fit table width. * fix c++ style * fix table * refine docs * refine model_zoo/ernie-1.0/README.md * readfine readme. * fix link * fix bug * fix documents. * add weight. * fix config * Update README.md & Add more data into csv& change UI (#3237) * fix bug of label dimension smaller than 1 (#3238) * update output dirname of compression api (#3252) * [ModelingOutput] add tinybert/Electra/XLNet/ALBERT/ERNIE-M more output & loss (#3148) * complete tinybert more output & loss * complete tinybert/erniem output * complete xlnet unittest * complete the electra unittest * complete albert more modeling output * complete albert more modeling output * complete ernie-doc model more output * revert ernie-doc modeling * update more output * update model testing * convert paddle.is_tensor -> isinstance * update tinybert & electra models * Add unit tests for T5 (#3115) * analysis_module_bug_fix (#3246) * [CodeStyle] Add copyright for python file. (#3259) * Add copyright for python files. * [IssueTemplate] Add issue template (#3251) * update issue-template * remove old issue template * add id field to template * update github issue template * [BugFix]update vocab_size in init_config (#3260) * update vocab_size in init_config * make update_init_config more common Co-authored-by: Zhong Hui <zhonghui.net@gmail.com> * update t5 tests (#3266) * Update debug mode for relation prompt (#3263) * update debug mode for relation prompt * update * update * Update README.md and Rename dir to FAQ directory (#3272) * [DOC] Add ernie-1.0-base-zh-cw benchmark results. (#3248) * [DOC] Update highlights of README.md (#3278) * Update README.md * Update README.md * Add unit tests for UnifiedTransformer (#3177) * [Trainer] Support recompute for trainer. (#3261) * support recompute for trainer. * Upgrade FAQ finance to Milvus 2.1 (#3267) * Upgrade FAQ finance to Milvus 2.1 * Update text format for faq * Update feature_extract.sh * Fix ft substr bug (#3279) * optimize cmakelist * Add substr pos check * remove glog/logging.h (#3280) * Update ft version to 0.2.0 (#3285) * update docs wechat code (#3284) * update link typo (#3236) * add_dataset_link (#3286) * Add use_faster flag for uie of taskflow. (#3194) * Add use_faster flag for taskflow * Add empty line * Add doc of uie * remove faster_tokenizer tmp * merge * fix import error (#2853) * [TIPC]Support @to_static train for base-transformer (#3277) * [TIPC]Support @to_static train for base-transformer * Fix to_static args * Add ft compile doc and scripts (#3292) * Fix the mac compile * Add cpp, python lib building scripts * Remove cache in cpp lib * Add compile docs * fix ft build script (#3293) * Add Milvus2.1 Support and Update pipielines qa ui (#3283) * Add Milvus Support and Update pipielines qa ui * Remove unused comments * fix bug of relation example is empty (#3295) * Compression API Supports ERNIE-M and more Pretrained models (#3234) * update compression doc * update compression doc * support more models and update compression api * update inputspec info, avoid error * optimize train.py (#3300) * update ernie task tipc * update * optimize_sparse_strategy (#3311) * Add FAQ and missing json output files (#3298) * Add Docker compile Support for Pipelines (#3315) * Add Docker compile Support * change cuda to uppercase * Update README_en.md (#3320) * Update README_en.md * Update README_en.md * Update README_en.md * Update README_en.md * Update README_en.md * Update README_en.md * Update README_en.md * Update __init__.py * Replace OMP with std::thread (#3309) * fix bug and codestyle * save change * change code style * fix conflict * change h file * Update tokenizer.cc Co-authored-by: zhoushunjie <zhoushunjie@baidu.com> Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com> * update tipc log (#3333) * Remove unused function of Pipelines (#3330) * update CodeGen doc (#3299) * update doc * update doc * update docs Co-authored-by: 骑马小猫 <1435130236@qq.com> * fix tipc log (#3337) * [MoE] Fix recompute & communication api (#3338) * update moe recompute. * [few-shot] fix typo and failed links (#3339) Co-authored-by: Zhong Hui <zhonghui.net@gmail.com> * [New Model]add t5-encoder-model (#3168) * add t5-encoder-model * update t5model * update t5encoder & test modeling * update t5 * update type hinting * update cache type annotation * Update retrieval based classification README.md (#3322) * Update retrieval based classification README.md * Revert predict.py * Update cpu predict script * restore gpu config * Fix TIPC log path (#3347) * Upgrade Neural Search README.md (#3350) * support layoutxlm re dygraph to static (#3325) * support layoutxlm re dygraph to static * fix error * upgrade-modeling-output (#3305) * upgrade-modeling-output * fix codestyle * Compression API supports ELECTRA (#3324) * supports electra * fix typo * [FasterGeneration] MBart supports dy2sta (#3356) * unimo unittests (#3349) * [Benchamrk] Fix fuse_transformer option of TIPC (#3358) * Fix the README description of Pipelines & Neural Search (#3353) * Fix the README description * Update Pipelines README.md * Update Docker README.md * Add more details for ranking model * supports distribute (#3361) * Fix the semantic search example mistakes (#3363) Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com> * [BugFix] Fix amp usage for evaluation. (#3303) * fix eval of amp usage. * fix * [MoE] Fix distributed wait api (#3365) * Fix gpt example attention mask (#3240) * add hf ds and upgrade example * fix attention mask * update * update attention mask * fix static attention mask * Fix erniegen no model_config_file (#3321) * fix * rm save_pretrained * fix tipc log for benchmark and upate bigru_crf config (#3373) * fix tipc log * fix tipc log and upate bigru_crf config * add t5 encoder model (#3376) * MBART supports freeze multi-lingual model when dy2sta (#3367) * fix dataloader memory overflow * add warning * Update README_en.md (#3375) edit typo Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com> * Improve CodeGen (#3371) * Add codegen unittests (#3348) * add codegen unittests * fix codegen * update * [BugFix] fix supporting `OrderedDict` bug in paddle.jit module (#3364) * convert keys to `__dict__` * use fields to get keys Co-authored-by: Guo Sheng <guosheng@baidu.com> * 【Hackathon + GradientCache】 (#1799) * gradient_cache * gradient_cache * gradient_cache * gradient_cache * data * train_for_gradient_cache * add * add * add * 修改 * 修改 * update * update * update * update * Update README_gradient_cache.md * Update README_gradient_cache.md * Update README_gradient_cache.md * feat: modified the code * fix: delete useless code * feat: added requirements.txt * feat: modify readme * feat: modify some code * feat: code style * feat: add function * feat: add licence * feat: add comments * Update README_gradient_cache.md * feat: modify readme * feat: modify readme * fix: copyright * fix: yapf * feat: modify readme * feat: modify readme * feat: delete useless code * feat: add new explain Co-authored-by: 吴高升 <wugaosheng@mails.ccnu.edu.cn> Co-authored-by: 吴高升 <w5688414@gmail.com> * [TIPC] Add scripts for npu and xpu, test=develop (#3377) * add scripts for xpu and npu * add npu/xpu args * add script for xpu * add npu/xpu args to predict.py * fix codestyle ci bug * add copyright * fix copyright_checker * Add ERNIE-LayoutX (#3183) * Add ernie-layoutx * simplify code * simplify code * support batch input * add word_boxes support * Update docs * update * Update README.md * Udpate README.md * Update README.md * Update README.md * [Dygraph] Support sharding stage2/3+dp in GPT-3 model (#2471) * add sharding+dp * update * code style check Co-authored-by: gongenlei <gongel@qq.com> * complete t5 more output (#3370) * fix gpt N4C32 dp script bug (#3392) * codestyle * Update README.md of neural search (#3391) * Update artist model activateion (#3106) * update * rename * fix gpt ut (#3407) * add qg example * delete useless scripts * delete .sh files in t5 dir * normalize t5 naming * rewrite run_gen.py to train.py and predict.py in unimo-text * Update README_cn.md (#3413) * fix bigru crf offset index error (#3418) * modified according to zeyang's comments * modified according to zeyang's comments * fix bert unittest bug (#3422) * fix bert unittest bug * change token_labels -> sequence_labels * [BugFix]Fix ernie tokenizer unittest (#3423) * fix bert unittest bug * change token_labels -> sequence_labels * update ernie tokenizer max_input_size * update qg example readme * fix pillow deperate warning (#3404) Co-authored-by: gongenlei <gongel@qq.com> * Update taskflow.py (#3424) fix typo * fix bug of debug mode (#3417) * rewrite unimo-text/predict.py to retrain only the prediction function * support paddle serving http deploy for text classification (#3378) * add_http_deploy * [prompt] add doc (#3362) * modified according to zeyang's comments, 20221010 * [few-shot] fix script for multi_class and fix input type for windows (#3426) * Update README_cn.md * adjust the position of the experiment' result * support mlu training (#3431) * support mlu training * [mlu] add mlu config in rnn and ernie-1.0 README. * remove the tcn for the paddlenlp (#3435) * add qg-taskflow * fix code style * Add multi type files index update example for pipelines (#3439) * [MLU] support SQuAD_Bert with mlu device (#3434) * Update FAQ Finance Paddle Serving dependencies (#3430) * Add batch prediction for pipelines (#3432) * Add batch prediction for pipelines * Fix some hardcode problem& Update comments * Support past_key_values argument for Electra (#3411) * unit test pass; fix yapf * change docstring Co-authored-by: 骑马小猫 <1435130236@qq.com> Co-authored-by: Guo Sheng <guosheng@baidu.com> * modified according to zeyang's comments * refine gpt (#3447) * fix some typos in qg-example readme * Fix #3446 (#3457) * update Pillow version * compare version * [NEW Features] feature_extraction and processor support from_pretrained (#3453) * update * add import * Update README.md and optimize DocPrompt postprocess (#3441) * Update README.md * optimize sort * update * Update * Update * Update * Update * Update * Update * update * update * Add english docs and rename ernie_layout * Add english docs and rename ernie_layout * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * Update taskflow.md * update * add symbolic link for ernie_layout * Update README.md Co-authored-by: wj-Mcat <1435130236@qq.com> Co-authored-by: yujun <50394665+JunnYu@users.noreply.github.com> Co-authored-by: 吴高升 <w5688414@gmail.com> Co-authored-by: limingshu <61349199+JamesLim-sy@users.noreply.github.com> Co-authored-by: chenxiaozeng <chenshuo07@baidu.com> Co-authored-by: tianxin <tianxin04@baidu.com> Co-authored-by: Guo Sheng <guosheng@baidu.com> Co-authored-by: bruce0210 <100854336+bruce0210@users.noreply.github.com> Co-authored-by: wuhuachaocoding <77733235+wuhuachaocoding@users.noreply.github.com> Co-authored-by: Zhong Hui <zhonghui.net@gmail.com> Co-authored-by: wawltor <fangzeyang0904@hotmail.com> Co-authored-by: liu zhengxi <380185688@qq.com> Co-authored-by: kztao <taokuizu@qq.com> Co-authored-by: Jack Zhou <zhoushunjie@baidu.com> Co-authored-by: paopjian <672034519@qq.com> Co-authored-by: gongenlei <gongel@qq.com> Co-authored-by: lugimzzz <63761690+lugimzzz@users.noreply.github.com> Co-authored-by: Jiaqi Liu <709153940@qq.com> Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com> Co-authored-by: Thomas Young <35565423+HexToString@users.noreply.github.com> Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com> Co-authored-by: zhengya01 <43601548+zhengya01@users.noreply.github.com> Co-authored-by: Roc <30228238+sljlp@users.noreply.github.com> Co-authored-by: Noel <wanghuijuan03@baidu.com> Co-authored-by: zhoujun <572459439@qq.com> Co-authored-by: Liujie0926 <44688141+Liujie0926@users.noreply.github.com> Co-authored-by: westfish <westfish@126.com> Co-authored-by: Septilliony <52767905+Septilliony@users.noreply.github.com> Co-authored-by: Elvis Stuart <75023175+Elvisambition@users.noreply.github.com> Co-authored-by: 吴高升 <wugaosheng@mails.ccnu.edu.cn> Co-authored-by: duanyanhui <45005871+YanhuiDua@users.noreply.github.com> Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com> Co-authored-by: Yam <40912707+Yam0214@users.noreply.github.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: alkaid <41095516+alkaideemo@users.noreply.github.com> Co-authored-by: Chenxiao Niu <ncx_bupt@163.com> Co-authored-by: qipengh <qipengh@qq.com> Co-authored-by: Sijun He <sijun.he@hotmail.com>

supports distribute

4b7f964

LiuChiachi self-assigned this Sep 23, 2022

LiuChiachi added the model-compression label Sep 23, 2022

LiuChiachi requested review from wawltor and lugimzzz September 23, 2022 10:18

lugimzzz approved these changes Sep 23, 2022

View reviewed changes

Merge branch 'develop' into compression-api-supports-distribute

57237c0

LiuChiachi merged commit fb69d15 into PaddlePaddle:develop Sep 23, 2022

LiuChiachi mentioned this pull request Oct 13, 2022

PaddleNLP 2.4.1 Release Note Candidate #3448

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression API supports distributed training #3361

Compression API supports distributed training #3361

LiuChiachi commented Sep 23, 2022 •

edited

Loading

lugimzzz left a comment

Compression API supports distributed training #3361

Compression API supports distributed training #3361

Conversation

LiuChiachi commented Sep 23, 2022 • edited Loading

PR types

PR changes

Description

lugimzzz left a comment

Choose a reason for hiding this comment

LiuChiachi commented Sep 23, 2022 •

edited

Loading