Diverse PSRO is a variation of the Policy Space Response Oracle algorithm which promotes training a behaviourally diverse set of policies by using the theory of determinantal point processes (DPPs). This approach allows to train less exploitable more diverse strategies as well as bringing a new geometrically interpretable way of measuring population diversity.
The code on this repository can be run by cloning the repository
git clone https://github.com/diversepsro/diverse_psro
Creating a new Anaconda environment
conda env create -f environment.yml
conda activate diverse_psro
You can now run Random Games of Skill by executing
python3 random_games_skill.py
You can now run Real World Meta-Games by executing
python3 spinning_tops_dpp.py
You can now run Non-transitive Mixture Model by executing
python3 non_mixture_model.py
Diverse PSRO is evaluated in three different settings, each of them using a different version of diverse oracle.
Game | Oracle |
---|---|
Random Games of Skill | Diverse BR |
Real World Meta-Games | Diverse BR |
Non-transitive mixture model | Diverse gradient ascent |