playing to learn to play
Azalea is a reinterpretation of the AlphaZero game AI learning algorithm for the Hex board game.
- Install (requires Python 3.6, virtualenv recommended):
pip install azalea
- Download the pre-trained model hex11-20180712-3362.policy.pth
- Play against the pre-trained model:
azalea-play hex11-20180712-3362.policy.pth
You can use the --first
option if you wish to play first.
- Straightforward reimplementation of the AlphaZero algorithm except for MCTS parallelization (see below)
- Pre-trained model for Hex board game
- Fast MCTS implementation through Numba JIT acceleration.
- Fast Hex game move generation implementation through Numba.
- Parallelized self play to saturate Nvidia V100 GPU during training
- AI policy evaluation through round robin tournament, also parallelized
- Tested on Ubuntu 16.04
- Requires Python 3.6 and PyTorch 0.4
- Single GPU implementation only - tested on Nvidia V100, with 8 CPU's for move generation and MCTS, and 1 GPU for the policy network.
- Pre-trained model has smaller capacity: resnet having 6 blocks of 64 channels instead of 19 (or 39) blocks of 256 channels
- Only Hex game is implemented, though the code supports adding more games. Two components are needed for a new game: move generator and policy network, with board input and moves output adjusted to the new game.
- MCTS simulations are not run in parallel threads, but instead, self-play games are played in parallel processes. This is to avoid the need for a multi-threaded MCTS implementation while still maintaining fast training speed and saturating the GPU.
- MCTS simulation and board evaluations are batched according to
search_batch_size
config parameter. "Virtual loss" is used as in AlphaZero, to increase search diversity.
Clone the repository and install dependencies with Conda:
git clone https://github.com/jseppanen/azalea.git
conda env create -n azalea
source activate azalea
The default environment.yml
installs GPU packages but you can choose
environment-cpu.yml
for testing on a laptop.
python play.py models/hex11-20180712-3362.policy.pth
This will load the model and start playing, asking for your move. The
columns are labeled a–k and rows 1–11. The first player, playing X
's,
is trying to draw a vertical connected path through the board, while the
second player, with O
's, is drawing a horizontal path.
O O O O X . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . X . . . . . .
. . . . . X . . . . .
. . . . . . . . . . .
. . . . X . . . . . .
. . . . . . . . . . .
. . . X . . . . . . .
x . . . . . . . . . . .
o\\ . . . . . . . . . . .
last move: e1
Your move?
python train.py --config config/hex11_train_config.yml --rundir runs/train
python compare.py --config config/hex11_eval_config.yml --rundir runs/compare <mode1> <model2> [model3] ...
python tune.py