Code for paper Lifelong Reinforcement Learning with Modulating Masks. Implementation of modulatory mask combined with PPO. The repository contains MASK RI/LC/BLC implementations. Please see EWC branch for implemenation of PPO and Online EWC.
The code was developed on top of the existing DeepRL repository, extending PPO RL agents to lifelong learning setting.
Python 3.9 and PyTorch 1.12.0 were used for experiments in the paper.
For Mask RI/LC/BLC experiments and other baselines in the Procgen benchmark, please visit this repository.
- CT-graph
- Minigrid
- Continual World (see note below)
- See requirements.txt file
- See CT-graph requirements.
- See Minigrid requirements.
- See Continual World requirements and how to install. Note, MuJoCo (now freely available) is required to run Continual World
Example commands below using Minigrid environment. To run agents in the minigrid (MG10) curriculum defined in the paper, use the command below:
# baseline PPO agent.
python train_minigrid.py baseline --seed 86
# random initialization of mask per task (MASK RI) agent.
python train_minigrid.py ll_supermask --new_task_mask random --seed 86
# linear combination of mask (MASK LC) agent.
python train_minigrid.py ll_supermask --new_task_mask linear_comb --seed 86
Note:
- the command to run a balanced linear combination (MASK BLC) agent is the same as the MASK LC command above, but should be run in the
exp_maskblc
git branch. - the full list of commands to run experiments in the paper can be found in the
paper_experiments.txt
file. - sample commands and the full list of commands for
ewc
experiments in the paper can be found in theexp_ewc
git branch. - sample commands and the full list of commands for setting up the single task expert (STE) experiments can be found in the
exp_ste
git branch. - In the continualworld curriculum (CW10), the random initialization mask agent implemented in this branch is the MASK RI_C (continuous values mask). The sample command to run MASK RI_\D in CW10 can be found in the
exp_maskri_discrete_mask_cw10
git branch.
To cite this work, please use the information below. Thanks.
@article{esbn2022masklrl,
title={Lifelong Reinforcement Learning with Modulating Masks},
author={Ben-Iwhiwhu, Eseoghene and Nath, Saptarshi and Pilly, Praveen K and Kolouri, Soheil and Soltoggio, Andrea},
journal={arXiv preprint arXiv:2212.11110},
year={2022}
}
If you encounter any bugs using the code, please raise an issue in this repository on Github.
The Continual World benchmark was built on top of the Meta-World benchmark, which comprise of a number of simulated robotics tasks. The originally released Continual World employed the use of version 1 (v1) Meta-World environments. However, the Meta-World v1 environments contained some issues in the reward function (discussed here and here) which was fixed in the updated v2 environments. Therefore, the experiments in the paper employed the use of the v2 environment for each task in the Continual World. The modification can be downloaded from the forked repository here.