The code for now only supports torch--the support of jax is currently broken.
# run in torch
python run/train.py -a sync-ppo -e smac-8m_vs_9m -c smac_th -dl th
In this command:
-a sync-ppo
: Specifies the algorithmppo
using synchronous distributed architecture (sync
).-e smac-8m_vs_9m
: Selects the environment8m_vs_9m
from the SMAC suite (smac).-c smac_th
: Specifies the configuration (smac_th
).-dl th
: Indicates the use of Torch (th
) as the deep learning library.
A multi-agent reinforcement learning library.
这是一个模块化的分布式多智能体强化学习的框架. 它主要由三个模块构成: i) 单/多智能体算法, ii) 分布式训练框架, iii) 博弈. 本文先介绍框架的使用指南, 然后再依次阐述这三个模块设计.
- 容易上手, 不需要编程基础也能在半小时内轻松学会多机调参实验.
- 模块化设计, 方便扩展, 新算法和环境的引入只需要遵循预先定制的接口, 即可即插即用.
- 现有的基础算法在多个benchmark上取得了SOTA的水平, 包括SMAC, GRF等经典的多智能体测试环境.
- 分布式训练框架, 支持自博弈以及不对称的多种群博弈, 评估.
单/多智能体算法的入口在algo/train.py, 算法由Agent定义, 大部分的交互模块都定义在Runner这个类里.
All the following python run/train.py
can be replaced by python main.py
, which automatically detects unexpected halts caused by simulator errors and restarts the whole system accordingly.
For stable simulators, python run/train.py
is still the recommanded way to go.
# two agents playing against each other
python run/train.py -a ppo -e template-temp -c template template
python run/train.py -a ppo -e template-temp -c template -kw uid2aid=0,0 uid2gid=0,0
# self-play
python run/train.py -a async-ppo -e template-temp -c template
# run in torch
python run/train.py -a sync-ppo -e smac-8m_vs_9m -c smac_th -dl th
where sync
specifies the distributed architecture(dir: distributed), ppo
specifies the algorithm(dir: algo), template
denotes the environment suite, and temp
is the environment name
By default, all the checkpoints and loggings are saved in ./logs/{env}/{algo}/{model_name}/
.
You can also make some simple changes to *.yaml
from the command line
# change learning rate to 0.0001, `lr` must appear in `*.yaml`
python run/train.py -a sync-hm -e unity-combat2d -kw lr=0.0001
This change will automatically be embodied in Tensorboard, making it a recommanded way to do some simple hyperparameter tuning. Alternatively, you can modify configurations in *.yaml
and specify model_name
manually using command argument -n your_model_name
.
python run/eval.py magw-logs/n_envs=64-n_steps=20-n_epochs=1/seed=4/ -n 1 -ne 1 -nr 1 -r -i eval -s 256 256 --fps 1
The above code presents a way for evaluating a trained model, where
magw-logs/n_envs=64-n_steps=20-n_epochs=1/seed=4/
is the model path-n
specifies the number of eposodes to run-ne
specifies the number of environments running in parallel-nr
specifies the number of ray actors are devoted for runniing-r
visualizes the video and save it as a*.gif
file-i
specifies the video name-s
specifies the screen size of the video--fps
specifies the fps of the saved*.gif
file
In some multi-agent settings, we may prefer using different configurations for different agents. The following code demonstrates an example of running multi-agent algorithms with multiple configurations, one for each agent.
# make sure `unity.yaml` and `unity2.yaml` exist in `configs/` directory
# the first agent is initialized with the configuration specified by `unity.yaml`,
# while the second agent is initialized with the configuration specified by `unity2.yaml`
python run/train.py -a sync-hm -e unity-combat2d -c unity unity2