Skip to content

Latest commit

 

History

History
63 lines (63 loc) · 2.53 KB

2022-06-28-ao22a.md

File metadata and controls

63 lines (63 loc) · 2.53 KB
title booktitle abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
EAT-C: Environment-Adversarial sub-Task Curriculum for Efficient Reinforcement Learning
Proceedings of the 39th International Conference on Machine Learning
Reinforcement learning (RL) is inefficient on long-horizon tasks due to sparse rewards and its policy can be fragile to slightly perturbed environments. We address these challenges via a curriculum of tasks with coupled environments, generated by two policies trained jointly with RL: (1) a co-operative planning policy recursively decomposing a hard task into a coarse-to-fine sub-task tree; and (2) an adversarial policy modifying the environment in each sub-task. They are complementary to acquire more informative feedback for RL: (1) provides dense reward of easier sub-tasks while (2) modifies sub-tasks’ environments to be more challenging and diverse. Conversely, they are trained by RL’s dense feedback on sub-tasks so their generated curriculum keeps adaptive to RL’s progress. The sub-task tree enables an easy-to-hard curriculum for every policy: its top-down construction gradually increases sub-tasks the planner needs to generate, while the adversarial training between the environment and RL follows a bottom-up traversal that starts from a dense sequence of easier sub-tasks allowing more frequent environment changes. We compare EAT-C with RL/planning targeting similar problems and methods with environment generators or adversarial agents. Extensive experiments on diverse tasks demonstrate the advantages of our method on improving RL’s efficiency and generalization.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
ao22a
0
{EAT}-C: Environment-Adversarial sub-Task Curriculum for Efficient Reinforcement Learning
822
843
822-843
822
false
Ao, Shuang and Zhou, Tianyi and Jiang, Jing and Long, Guodong and Song, Xuan and Zhang, Chengqi
given family
Shuang
Ao
given family
Tianyi
Zhou
given family
Jing
Jiang
given family
Guodong
Long
given family
Xuan
Song
given family
Chengqi
Zhang
2022-06-28
Proceedings of the 39th International Conference on Machine Learning
162
inproceedings
date-parts
2022
6
28