PPO Policy Bug in Parallel Mode #176

tianhan4 · 2022-01-06T07:40:58Z

PaParaZz1 · 2022-01-06T07:45:54Z

Can you offer your launch script or main entry file to help us reproduce your result?

tianhan4 · 2022-01-06T08:58:38Z

from easydict import EasyDict

cartpole_dqn_config = dict(
    exp_name='cartpole_ppo',
    env=dict(
        collector_env_num=8,
        collector_episode_num=2,
        evaluator_env_num=5,
        evaluator_episode_num=1,
        stop_value=195,
    ),
    policy=dict(
        cuda=False,
        action_space='discrete',
        model=dict(
            obs_shape=4,
            action_shape=2,
            action_space='discrete',
        ),
        learn=dict(
            batch_size=32,
            learning_rate=0.001,
            value_weight=0.5,
            entropy_weight=0.01,
            clip_ratio=0.2,
            learner=dict(
                learner_num=1,
                send_policy_freq=1,
            ),
        ),
        collect=dict(
            n_sample=256,
            unroll_len=1,
            discount_factor=0.9,
            gae_lambda=0.95,
            collector=dict(
                collector_num=2,
                update_policy_second=3,
            ),
        ),
        eval=dict(evaluator=dict(eval_freq=50, )),
        other=dict(
            eps=dict(
                type='exp',
                start=0.95,
                end=0.1,
                decay=100000,
            ),
            replay_buffer=dict(
                replay_buffer_size=100000,
                enable_track_used_data=False,
            ),
            commander=dict(
                collector_task_space=2,
                learner_task_space=1,
                eval_interval=5,
            ),
        ),
    ),
)
cartpole_dqn_config = EasyDict(cartpole_dqn_config)
main_config = cartpole_dqn_config

cartpole_dqn_create_config = dict(
    env=dict(
        type='cartpole',
        import_names=['dizoo.classic_control.cartpole.envs.cartpole_env'],
    ),
    env_manager=dict(type='base'),
    policy=dict(type='ppo_command'),
    learner=dict(type='base', import_names=['ding.worker.learner.base_learner']),
    collector=dict(
        type='zergling',
        import_names=['ding.worker.collector.zergling_parallel_collector'],
    ),
    commander=dict(
        type='solo',
        import_names=['ding.worker.coordinator.solo_parallel_commander'],
    ),
    comm_learner=dict(
        type='flask_fs',
        import_names=['ding.worker.learner.comm.flask_fs_learner'],
    ),
    comm_collector=dict(
        type='flask_fs',
        import_names=['ding.worker.collector.comm.flask_fs_collector'],
    ),
)
cartpole_dqn_create_config = EasyDict(cartpole_dqn_create_config)
create_config = cartpole_dqn_create_config

cartpole_dqn_system_config = dict(
    coordinator=dict(),
    path_data='./{}/data'.format(main_config.exp_name),
    path_policy='./{}/policy'.format(main_config.exp_name),
    communication_mode='auto',
    learner_gpu_num=1,
)
cartpole_dqn_system_config = EasyDict(cartpole_dqn_system_config)
system_config = cartpole_dqn_system_config

if __name__ == '__main__':
    from ding.entry.parallel_entry import parallel_pipeline
    parallel_pipeline([main_config, create_config, system_config], seed=9)

* fix/fix_submodule_err (opendilab#61) * fix/fix_submodule_err --------- Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> * fix issue templates (opendilab#65) * fix(tokenizer): refactor tokenizer and update usage in readme (opendilab#51) * update tokenizer example * fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (opendilab#73) * fix a typo in readme * in order to find InternLMTokenizer, select a lower version of Transformers --------- Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> * [Doc] Add wechat and discord link in readme (opendilab#78) * Doc：add wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * Doc：update wechat and discord link * [Docs]: add Japanese README (opendilab#43) * Add Japanese README * Update README-ja-JP.md replace message * Update README-ja-JP.md * add repetition_penalty in GenerationConfig in web_demo.py (opendilab#48) Co-authored-by: YWMditto <862779238@qq.com> * use fp16 in instruction (opendilab#80) * [Enchancement] add more options for issue template (opendilab#77) * [Enchancement] add more options for issue template * update qustion icon * fix link * Use tempfile for convert2hf.py (opendilab#23) Fix InternLM/InternLM#50 * delete torch_dtype of README's example code (opendilab#100) * set the value of repetition_penalty to 1.0 to avoid random outputs (opendilab#99) * Update web_demo.py (opendilab#97) Remove meaningless log. * [Fix]Fix wrong string cutoff in the script for sft text tokenizing (opendilab#106) * docs(install.md): update dependency package transformers version to >= 4.28.0 (opendilab#124) Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * docs(LICENSE): add license (opendilab#125) * add license of colossalai and flash-attn * fix lint * modify the name * fix AutoModel map in convert2hf.py (opendilab#116) * variables are not printly as expect (opendilab#114) * feat(solver): fix code to adapt to torch2.0 and provide docker images (opendilab#128) * feat(solver): fix code to adapt to torch2.0 * docs(install.md): publish internlm environment image * docs(install.md): update dependency packages version * docs(install.md): update default image --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> * add demo test (opendilab#132) Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * fix web_demo cache accelerate (opendilab#133) * Doc: add twitter link (opendilab#141) * Feat add checkpoint fraction (opendilab#151) * feat(config): add checkpoint_fraction into config * feat: remove checkpoint_fraction from configs/7B_sft.py --------- Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> * [Doc] update deployment guide to keep consistency with lmdeploy (opendilab#136) * update deployment guide * fix error * use llm partition (opendilab#159) Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * test(ci_scripts): clean test data after test, remove unnecessary global variables, and other optimizations (opendilab#165) * test: optimization of ci scripts(variables, test data cleaning, etc). * chore(workflows): disable ci job on push. * fix: update partition * test(ci_scripts): add install requirements automaticlly,trigger event about lint check and other optimizations (opendilab#174) * add pull_request in lint check * use default variables in ci_scripts * fix format * check and install requirements automaticlly * fix format --------- Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * feat(profiling): add a simple memory profiler (opendilab#89) * feat(profiling): add simple memory profiler * feat(profiling): add profiling argument * feat(CI_workflow): Add PR & Issue auto remove workflow (opendilab#184) * feat(ci_workflow): Add PR & Issue auto remove workflow Add a workflow for stale PR & Issue auto remove - pr & issue well be labeled as stale for inactive in 7 days - staled PR & Issue well be remove in 7 days - run this workflow every day on 1:30 a.m. * Update stale.yml * feat(bot): Create .owners.yml for Auto Assign (opendilab#176) * Create .owners.yml: for issue/pr assign automatically * Update .owners.yml * Update .owners.yml fix typo * [feat]: add pal reasoning script (opendilab#163) * [Feat] Add PAL inference script * Update README.md * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update tools/pal_inference.py Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update pal script * Update README.md * restore .ore-commit-config.yaml * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update tools/README.md Co-authored-by: BigDong <yudongwang1226@gmail.com> * Update pal inference script * Update READMD.md * Update internlm/utils/interface.py Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> * Update pal script * Update pal script * Update script * Add docstring * Update format * Update script * Update script * Update script --------- Co-authored-by: BigDong <yudongwang1226@gmail.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> * test(ci_scripts): add timeout settings and clean work after the slurm job (opendilab#185) * restore pr test on develop branch * add mask * add post action to cancel slurm job * remove readonly attribute on job log * add debug info * debug job log * try stdin * use stdin * set default value avoid error * try setting readonly on job log * performance echo * remove debug info * use squeue to check slurm job status * restore the lossed parm * litmit retry times * use exclusive to avoid port already in use * optimize loop body * remove partition * add {} for variables * set env variable for slurm partition --------- Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> * refactor(tools): move interface.py and import it to web_demo (opendilab#195) * move interface.py and import it to web_demo * typo * fix(ci): fix lint error * fix(ci): fix lint error --------- Co-authored-by: Sun Peng <sunpengsdu@gmail.com> Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu> Co-authored-by: Kai Chen <chenkaidev@gmail.com> Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com> Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com> Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com> Co-authored-by: vansin <msnode@163.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com> Co-authored-by: YWMditto <862779238@qq.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com> Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com> Co-authored-by: Shuo Zhang <zhangshuolove@live.com> Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com> Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com> Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn> Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com> Co-authored-by: Guoteng <32697156+SolenoidWGT@users.noreply.github.com> Co-authored-by: wangguoteng.p <wangguoteng925@qq.com> Co-authored-by: lvhan028 <lvhan_028@163.com> Co-authored-by: zachtzy <141206206+zachtzy@users.noreply.github.com> Co-authored-by: cx <759046501@qq.com> Co-authored-by: Jaylin Lee <61487970+APX103@users.noreply.github.com> Co-authored-by: del-zhenwu <dele.zhenwu@gmail.com> Co-authored-by: Shaoyuan Xie <66255889+Daniel-xsy@users.noreply.github.com> Co-authored-by: BigDong <yudongwang1226@gmail.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> Co-authored-by: huangting4201 <huangting3@sensetime.com>

PaParaZz1 added bug Something isn't working parallel-dist Parallel and distributed training related labels Jan 6, 2022

PaParaZz1 mentioned this issue May 15, 2022

feature(nyz): add new middleware distributed demo #321

Merged

7 tasks

PaParaZz1 closed this as completed Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO Policy Bug in Parallel Mode #176

PPO Policy Bug in Parallel Mode #176

tianhan4 commented Jan 6, 2022 •

edited by PaParaZz1

Loading

PaParaZz1 commented Jan 6, 2022

tianhan4 commented Jan 6, 2022

PPO Policy Bug in Parallel Mode #176

PPO Policy Bug in Parallel Mode #176

Comments

tianhan4 commented Jan 6, 2022 • edited by PaParaZz1 Loading

PaParaZz1 commented Jan 6, 2022

tianhan4 commented Jan 6, 2022

tianhan4 commented Jan 6, 2022 •

edited by PaParaZz1

Loading