Hacky research code that trains policies for the CodeCraft real-time strategy game with proximal policy optimization.
Blog post: Mastering Real-Time Strategy Games with Deep Reinforcement Learning: Mere Mortal Edition
- Python >= 3.7, pip
- CodeCraft Server
Install dependencies with
pip install -r requirements.txt
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+${CUDA}.html
where ${CUDA}
should be replaced by either cpu
, cu92
, cu101
or cu102
depending on your PyTorch installation.
If you want the training code to record metrics to Weights & Biases, run wandb login
.
The first step is to setup and run CodeCraft Server.
To train a policy with the default set of hyperparameters, run:
EVAL_MODELS_PATH=/path/to/golden-models python main.py --hpset=standard --out-dir=${OUT_DIR}`
Logs and model checkpoints will be written to the ${OUT_DIR}
directory.
If you want policies to be evaluted against a set of fixed opponents during training, download the required checkpoints available here to the right subfolder in the folder specified by EVAL_MODEL_PATH
.
For evaluations with the standard config, you need standard/curious-galaxy-40M.pt
and standard/graceful-frog-100M.pt
.
To disable evaluation of the policy during training, set --eval_envs=0
.
To see additional options, run python main.py --help
and consult hyperparams.py.
To run games with already trained policies, run:
python showmatch.py /path/to/policy1.pt /path/to/policy2.pt --task=STANDARD --num_envs=64
You can then watch the games at http://localhost:9000/observe?autorestart=true&autozoom=true.
The job runner allows you to schedule and execute many runs in parallel. The command
python runner.py --jobfile-dir=${JOB_DIR} --out-dir=${OUT_DIR} --concurrency=${CONCURRENCY}
starts a job runner that watches the ${JOB_DIR}
directory for new jobs, writes results to folders created in ${OUT_DIR}
and will run up to ${CONCURRENCY}
experiments in parallel.
You can then schedule jobs with
python schedule.py --repo-path=https://github.com/cswinter/DeepCodeCraft.git --queue-dir=${JOB_DIR} --params-file=params.yaml
where params.yaml
is a file that specifies the set of hyperparameters to use, for example:
- hpset: standard
adr_variety: [0.5, 0.3]
lr: [0.001, 0.0003]
- hpset: standard
repeat: 4
steps: 300e6
The repeat
parameter tells the job runner to spawn multiple runs.
When a hyperparameter is set to a list of different values, one experiment is spawned for each combination.
So above params.yaml
will spawn a total of 8 experiment runs, 4 of which will run for 300 million samples with the default set of hyperparameters, and one additional run for all 4 combinations of the adr_variety
and lr
hyperparameters.
The ${JOB_DIR}
may be on a remote machine that you can access via ssh/rsync, e.g. --queue-dir=192.168.0.101:/home/clemens/xprun/queue
.
@misc{DeepCodeCraft2020,
author = {Winter, Clemens},
title = {Deep CodeCraft},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/cswinter/DeepCodeCraft}}
}