This repo contains the implementation of the benchmark described in our paper Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi.
The codebase is mostly based on off-belief-learning repo.
hanabi-learning-environment
is a modified version of the original
HLE from Deepmind.
Please refer to the setup instruction oon off-belief-learning repo.
To pre-train hanabi agents,
cd pyhanabi
sh scripts/iql.sh
To finetune an agent to a pre-traind agent, use the following script:
cd pyhanabi
sh scripts/adaptation.sh
Note that, before running the script, --load_model
and --coop_agents
should be specified that shows the path to directory of the learner and cooperative partners checkpoints.
To download the trained models used in the paper, go to models
folder and run
sh download_pool.sh