This repo contains code and models for Continuous Coordination As a Realistic Scenario for Lifelong Learning, a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on hanabi — a partially-observable, fully cooperative multi-agent game.
Lifelong Hanabi consists of 3 phases: 1- Pre-training, 2- Continual training, 3- Testing.
The code is built on top of the Other-Play & Simplified Action Decoder in Hanabi repo.
The build process is tested with Python 3.7, PyTorch 1.5.1, CUDA 10.1, cudnn 7.6, and nccl 2.4
# clone the repo
git clone --recursive git@github.com:chandar-lab/Lifelong-Hanabi.git
cd Lifelong-Hanabi
# create new conda env
conda create -n lifelong_hanabi python=3.7
conda activate lifelong_hanabi
pip install -r requirements.txt
# build
mkdir build
cd build
cmake ..
make
mv hanalearn.cpython-37m-x86_64-linux-gnu.so ..
mv rela/rela.cpython-37m-x86_64-linux-gnu.so ..
mv hanabi-learning-environment/libpyhanabi.so ../hanabi-learning-environment/
Once the building is done and the .so
files are moved to their required places as mentioned above, every subsequent time you just need to run:
conda activate lifelong_hanabi
export PYTHONPATH=/path/to/lifelong_hanabi:$PYTHONPATH
export OMP_NUM_THREADS=1
Run the following command to download the pre-trained agents used in the paper.
pip install gdown
gdown --id 1rpmTPIT-g026pdQfAwHoE4i8tP7Qj2vI
You can find a detailed description of each agent's configs and architectures here:
results/Pre-trained agents pool for Continual Hanabi.xlsx
all_pretrained_pool.zip
contains the pre-trained agents we used in our experiments (this can be extended by further training more expert Hanabi players).
To run any .sh
file, update <path-to-pretrained-model-pool-dir>
and <save-dir>
, accordingly.
Important flags are:
Flags | Description |
---|---|
--sad |
enables Simplified Action Decoder |
--pred_weight |
weight for auxiliary task (typically 0.25) |
--shuffle_color |
enable other-play |
--seed |
seed |
For details of other hyperparameters refer code and/or paper.
A sample script is provided in pyhanabi/tools/pretrain.sh
that can be run:
cd pyhanabi
sh tools/pretrain.sh
To evaluate all the agents with each other, run:
cd pyhanabi
sh generate_cp.sh
Cross-play matrix from our runs can be found in results/scores_data_100_nrun5.csv
(results/sem_data_100_nrun5.csv
contains s.e.m)
To train the learner with a set of 5 partners using for eg. ER method, run:
cd pyhanabi
sh tools/continual_learning_scripts/ER_easy_interactive.sh
Zero-shot and few-shot checkpoints will be stored in <save-dir>
.
Similar scripts are available for all the other algorithms described in paper.
In order to log the continual training results (from the above checkpoints stored in <save-dir>
), run:
cd pyhanabi
sh tools/continual_evaluation.sh
In order to implement a new lifelong learning algorithm, depending on the type of the algorithm you can modify one of the following:
Memory based methods: episodic_memory is a list of the replay buffers from previous tasks. You can change the way the batch is collected like here or the way this replayed batch constrains the current gradients code.
Regularization based methods: Here is where the fisher information matrix at the end of each task is estimated. You can modify the way corresponding regularization loss is calculated and added to the original loss here.
Training regimes: These are a list of hyper-parameters which has been shown here that have high impact on the performance of the lifelong learning algorithms.
Flags | Description |
---|---|
--optim_name |
optimizer |
--batchsize |
batch size |
--decay_lr |
learning rate decay |
--initial_lr |
initial learning rate |
To evaluate the learner against a set of unseen agents, run:
cd pyhanabi
sh tools/testing.sh
Logging continual training results and testing requires a wandb account to plot the results.
All the plots and experiment details are available at wandb report.
- Other code used to reproduce figures in the paper can be found in
results
If you found this work useful, please consider citing our paper.
@misc{nekoei2021continuous,
title={Continuous Coordination As a Realistic Scenario for Lifelong Learning},
author={Hadi Nekoei and Akilesh Badrinaaraayanan and Aaron Courville and Sarath Chandar},
year={2021},
eprint={2103.03216},
archivePrefix={arXiv},
primaryClass={cs.LG}
}