→ See the full article here
This repository contains the source code for the project "Centralized control for multi-agent RL in a complex Real-Time-Strategy game", which was submitted as the final project in the COMP579 - Reinforcement Learning course at McGill given by Prof. Doina Precup in Winter 2023.
→ The main scripts for understanding the code are fully commented. We present the PDF report and the code in the following sections.
→ The full report of the project is available here.
→ The Weights & Biases logs of our experiments are available here.
PPO in Lux | during training |
---|---|
→ There are 2 main scripts of ~1000 and ~900 lines of code which are src/envs_folder/custom_env.py
and src/ppo_res_gridnet_multigpu.py
.
→ The repository contains many variations of gridnet scripts but the simplest one and fully commented is src/ppo_res_gridnet_multigpu.py
.
To train our gridnet in Lux:
-
Clone this repository
-
Install the requirements
-
Train Gridnet (example uses 1 GPU and 1 process)
cd src
torchrun --standalone --nproc_per_node 1 ppo_res_gridnet_multigpu.py --device-ids 0
The best agent was trained using the best parameters discovered in the hyperparameter sweep and 16 processes on 8 GPUs, running:
torchrun --standalone --nproc_per_node 16 ppo_pixel_gridnet_multigpu.py
--total-timesteps 1000000000
--clip-coef=0.14334778465053272
--ent-coef=0.002408486638907176
--gae-lambda=0.9322312137190516
--gamma=0.9945973988514306
--learning-rate=0.0016166261475302418
--max-grad-norm=0.28978755223510055
--minibatch-size=128
--num-envs=256
--num-steps=64
--pool-size=5
--save-every=50
--update-epochs=7
--vf-coef=0.2734614814048212
--device-ids 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
In this project we implement an RL agent to compete in the Lux AI v-2 Kaggle Competition. Lux is a 1vs1 real-time-strategy game in which players must compete for resources and grow lichen in Mars. Lux is a multi-agent environment because players control variable-sized fleets of units of different natures (e.g. light and heavy robots, and factories). The full specifications of the lux environment are available here.
We propose a pixel-to-pixel architecture that we train with Proximal Policy Optimization (PPO). The encoder is a stack of Residual Blocks with Squeeze-and-Excitation layers and ReLU activations and the decoders are both a stack of Transposed Convolutions and ReLU actiovations. The critic uses and AveragePool layer and 2 fully connected layers with a ReLU activation.
If you use this code, please cite it as below
@article{castanyer2023centralized,
title={Centralized control for multi-agent RL in a complex Real-Time-Strategy game},
author={Castanyer, Roger Creus},
journal={arXiv preprint arXiv:2304.13004},
year={2023}
}