feature(xjx): some incompatible refactorings #195

sailxjx · 2022-01-24T06:13:52Z

In the process of writing league code, some incompatible refactorings have been made, so I hope to merge it into main as soon as possible, so as to prevent future code from calling non-existing interfaces.

Changelog:

Merge task.emit and task.emit_remote methods.
Support slurm platform in ditask.
Move code of middleware in main_league into middleware dir.
Change 'task.emit' event to 'task._emit'.
Fix some bugs in Parallel.
Forbidden syncing context between processes.
Support str in match_labels

codecov · 2022-01-24T06:26:02Z

Codecov Report

Merging #195 (4ec230e) into main (aee2676) will decrease coverage by 0.06%.
The diff coverage is 46.66%.

@@            Coverage Diff             @@
##             main     #195      +/-   ##
==========================================
- Coverage   84.16%   84.09%   -0.07%     
==========================================
  Files         439      447       +8     
  Lines       33800    33937     +137     
==========================================
+ Hits        28447    28541      +94     
- Misses       5353     5396      +43

Flag	Coverage Δ
unittests	`84.09% <46.66%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
ding/entry/main.py	`0.00% <0.00%> (ø)`
ding/entry/main_league.py	`0.00% <0.00%> (ø)`
ding/framework/middleware/__init__.py	`0.00% <0.00%> (ø)`
ding/framework/middleware/league_collector.py	`0.00% <0.00%> (ø)`
ding/framework/middleware/league_dispatcher.py	`0.00% <0.00%> (ø)`
ding/framework/middleware/league_evaluator.py	`0.00% <0.00%> (ø)`
ding/framework/middleware/league_learner.py	`0.00% <0.00%> (ø)`
ding/entry/cli_ditask.py	`36.00% <18.51%> (-7.40%)`	⬇️
ding/framework/parallel.py	`84.43% <81.81%> (-1.20%)`	⬇️
ding/entry/cli_parsers/slurm_parser.py	`90.27% <90.27%> (ø)`
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aee2676...4ec230e. Read the comment docs.

PaParaZz1 · 2022-01-24T18:15:47Z

ding/entry/cli_parsers/slurm_parser.py

+
+class SlurmParser():
+
+    def __init__(self, platform_spec, **kwargs) -> None:


add python typing lint

PaParaZz1 · 2022-01-24T18:16:31Z

ding/entry/cli_parsers/slurm_parser.py

+        task = self._get_node_args(procid)
+        # Validation
+        assert task["address"] == nodename
+        return {**self.kwargs, **task}


deepcopy self.kwargs for security

this expression will not overwrite properties of kwargs, so there is no need to do deepcopy here.

PaParaZz1 · 2022-01-24T18:20:58Z

ding/entry/cli_parsers/tests/test_slurm_parser.py

+
+@pytest.fixture
+def set_slurm_env():
+    os.environ["SLURM_NTASKS"] = '6'  # 参数 n，总进程/任务数


use english comment

PaParaZz1 · 2022-01-24T18:22:18Z

ding/entry/cli_parsers/slurm_parser.py

+            nodelist = []
+            for tail in tails.split(","):
+                if "-" in tail:
+                    start, stop = tail.split("-")


this branch is not covered by test

PaParaZz1 · 2022-01-24T18:25:00Z

ding/scripts/main_league_slurm.sh

+export LC_ALL=en_US.utf-8
+export LANG=en_US.utf-8
+BASEDIR=$(dirname "$0")
+# srun -p Cerebra_Share --quotatype=reserved --mpi=pmi2 -n6 --ntasks-per-node=3 bash ding/scripts/main_league_slurm.sh


remove sensntive information

Changelog: Merge task.emit and task.emit_remote methods. Support slurm platform in ditask. Move code of middleware in main_league into middleware dir. Change 'task.emit' event to 'task._emit'. Fix some bugs in Parallel. Forbidden syncing context between processes. Support str in match_labels

* fix/fix_submodule_err (opendilab#61) * fix/fix_submodule_err --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * fix issue templates (opendilab#65) * fix(tokenizer): refactor tokenizer and update usage in readme (opendilab#51) * update tokenizer example * fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (opendilab#73) * fix a typo in readme * in order to find InternLMTokenizer, select a lower version of Transformers --------- Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> * [Doc] Add wechat and discord link in readme (opendilab#78) * Doc：add wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * [Docs]: add Japanese README (opendilab#43) * Add Japanese README * Update README-ja-JP.md replace message * Update README-ja-JP.md * add repetition_penalty in GenerationConfig in web_demo.py (opendilab#48) Co-authored-by: YWMditto <862779238@qq.com> * use fp16 in instruction (opendilab#80) * [Enchancement] add more options for issue template (opendilab#77) * [Enchancement] add more options for issue template * update qustion icon * fix link * Use tempfile for convert2hf.py (opendilab#23) Fix InternLM/InternLM#50 * delete torch_dtype of README's example code (opendilab#100) * set the value of repetition_penalty to 1.0 to avoid random outputs (opendilab#99) * Update web_demo.py (opendilab#97) Remove meaningless log. * [Fix]Fix wrong string cutoff in the script for sft text tokenizing (opendilab#106) * docs(install.md): update dependency package transformers version to >= 4.28.0 (opendilab#124) Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * docs(LICENSE): add license (opendilab#125) * add license of colossalai and flash-attn * fix lint * modify the name * fix AutoModel map in convert2hf.py (opendilab#116) * variables are not printly as expect (opendilab#114) * feat(solver): fix code to adapt to torch2.0 and provide docker images (opendilab#128) * feat(solver): fix code to adapt to torch2.0 * docs(install.md): publish internlm environment image * docs(install.md): update dependency packages version * docs(install.md): update default image --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * add demo test (opendilab#132) Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * fix web_demo cache accelerate (opendilab#133) * Doc: add twitter link (opendilab#141) * Feat add checkpoint fraction (opendilab#151) * feat(config): add checkpoint_fraction into config * feat: remove checkpoint_fraction from configs/7B_sft.py --------- Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> * [Doc] update deployment guide to keep consistency with lmdeploy (opendilab#136) * update deployment guide * fix error * use llm partition (opendilab#159) Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * test(ci_scripts): clean test data after test, remove unnecessary global variables, and other optimizations (opendilab#165) * test: optimization of ci scripts(variables, test data cleaning, etc). * chore(workflows): disable ci job on push. * fix: update partition * test(ci_scripts): add install requirements automaticlly,trigger event about lint check and other optimizations (opendilab#174) * add pull_request in lint check * use default variables in ci_scripts * fix format * check and install requirements automaticlly * fix format --------- Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * feat(profiling): add a simple memory profiler (opendilab#89) * feat(profiling): add simple memory profiler * feat(profiling): add profiling argument * feat(CI_workflow): Add PR & Issue auto remove workflow (opendilab#184) * feat(ci_workflow): Add PR & Issue auto remove workflow Add a workflow for stale PR & Issue auto remove - pr & issue well be labeled as stale for inactive in 7 days - staled PR & Issue well be remove in 7 days - run this workflow every day on 1:30 a.m. * Update stale.yml * feat(bot): Create .owners.yml for Auto Assign (opendilab#176) * Create .owners.yml: for issue/pr assign automatically * Update .owners.yml * Update .owners.yml fix typo * [feat]: add pal reasoning script (opendilab#163) * [Feat] Add PAL inference script * Update README.md * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update tools/pal_inference.py Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update pal script * Update README.md * restore .ore-commit-config.yaml * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update pal inference script * Update READMD.md * Update internlm/utils/interface.py Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> * Update pal script * Update pal script * Update script * Add docstring * Update format * Update script * Update script * Update script --------- Co-authored-by: BigDong <yudongwang1226@gmail.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> * test(ci_scripts): add timeout settings and clean work after the slurm job (opendilab#185) * restore pr test on develop branch * add mask * add post action to cancel slurm job * remove readonly attribute on job log * add debug info * debug job log * try stdin * use stdin * set default value avoid error * try setting readonly on job log * performance echo * remove debug info * use squeue to check slurm job status * restore the lossed parm * litmit retry times * use exclusive to avoid port already in use * optimize loop body * remove partition * add {} for variables * set env variable for slurm partition --------- Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * refactor(tools): move interface.py and import it to web_demo (opendilab#195) * move interface.py and import it to web_demo * typo * fix(ci): fix lint error * fix(ci): fix lint error --------- Co-authored-by: Sun Peng <sunpengsdu@gmail.com> Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: Kai Chen <chenkaidev@gmail.com> Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com> Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com> Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> Co-authored-by: vansin <msnode@163.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com> Co-authored-by: YWMditto <862779238@qq.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: Shuo Zhang <zhangshuolove@live.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com> Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com> Co-authored-by: Guoteng <32697156+SolenoidWGT@users.noreply.github.com> Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> Co-authored-by: lvhan028 <lvhan_028@163.com> Co-authored-by: zachtzy <141206206+zachtzy@users.noreply.github.com> Co-authored-by: cx <759046501@qq.com> Co-authored-by: Jaylin Lee <61487970+APX103@users.noreply.github.com> Co-authored-by: del-zhenwu <dele.zhenwu@gmail.com> Co-authored-by: Shaoyuan Xie <66255889+Daniel-xsy@users.noreply.github.com> Co-authored-by: BigDong <yudongwang1226@gmail.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> Co-authored-by: huangting4201 <huangting3@sensetime.com>

sailxjx added 13 commits January 24, 2022 14:08

Sep in files

faf20b6

Remove emit_remote

bd8d0b8

Add slurm parser

7ff1f7d

Move to script

162e751

Fix encoding

659b1a0

print traceback

7b30ff2

Change test

787caeb

a

800188f

a

2ff06b3

a

b6d5c12

Fix type

533ea0a

a

506c56e

Remove useless code

22569f9

sailxjx added 4 commits January 24, 2022 18:48

Support str in match_labels

1a909b9

Don't sync ctx

a69169d

Sync finish event between processes

17db735

Stop before renew context

ba2fd5b

PaParaZz1 approved these changes Jan 24, 2022

View reviewed changes

Fix pr

4ec230e

sailxjx merged commit aba0c62 into main Jan 25, 2022

sailxjx deleted the feature/league branch January 25, 2022 02:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(xjx): some incompatible refactorings #195

feature(xjx): some incompatible refactorings #195

sailxjx commented Jan 24, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading

PaParaZz1 Jan 24, 2022

PaParaZz1 Jan 24, 2022

sailxjx Jan 25, 2022

PaParaZz1 Jan 24, 2022

PaParaZz1 Jan 24, 2022

PaParaZz1 Jan 24, 2022


		class SlurmParser():

		def __init__(self, platform_spec, **kwargs) -> None:

feature(xjx): some incompatible refactorings #195

feature(xjx): some incompatible refactorings #195

Conversation

sailxjx commented Jan 24, 2022 • edited Loading

codecov bot commented Jan 24, 2022 • edited Loading

Codecov Report

PaParaZz1 Jan 24, 2022

Choose a reason for hiding this comment

PaParaZz1 Jan 24, 2022

Choose a reason for hiding this comment

sailxjx Jan 25, 2022

Choose a reason for hiding this comment

PaParaZz1 Jan 24, 2022

Choose a reason for hiding this comment

PaParaZz1 Jan 24, 2022

Choose a reason for hiding this comment

PaParaZz1 Jan 24, 2022

Choose a reason for hiding this comment

sailxjx commented Jan 24, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading