Azalea

playing to learn to play

Azalea is a reinterpretation of the AlphaZero game AI learning algorithm for the Hex board game.

Quick start

Install (requires Python 3.6, virtualenv recommended):

pip install azalea

Download the pre-trained model hex11-20180712-3362.policy.pth
Play against the pre-trained model:

azalea-play hex11-20180712-3362.policy.pth

You can use the --first option if you wish to play first.

Features

Straightforward reimplementation of the AlphaZero algorithm except for MCTS parallelization (see below)
Pre-trained model for Hex board game
Fast MCTS implementation through Numba JIT acceleration.
Fast Hex game move generation implementation through Numba.
Parallelized self play to saturate Nvidia V100 GPU during training
AI policy evaluation through round robin tournament, also parallelized
Tested on Ubuntu 16.04
Requires Python 3.6 and PyTorch 0.4

Differences to published AlphaZero

Single GPU implementation only - tested on Nvidia V100, with 8 CPU's for move generation and MCTS, and 1 GPU for the policy network.
Pre-trained model has smaller capacity: resnet having 6 blocks of 64 channels instead of 19 (or 39) blocks of 256 channels
Only Hex game is implemented, though the code supports adding more games. Two components are needed for a new game: move generator and policy network, with board input and moves output adjusted to the new game.
MCTS simulations are not run in parallel threads, but instead, self-play games are played in parallel processes. This is to avoid the need for a multi-threaded MCTS implementation while still maintaining fast training speed and saturating the GPU.
MCTS simulation and board evaluations are batched according to search_batch_size config parameter. "Virtual loss" is used as in AlphaZero, to increase search diversity.

Installation

Clone the repository and install dependencies with Conda:

git clone https://github.com/jseppanen/azalea.git
conda env create -n azalea
source activate azalea

The default environment.yml installs GPU packages but you can choose environment-cpu.yml for testing on a laptop.

Playing against pretrained model

python play.py models/hex11-20180712-3362.policy.pth

This will load the model and start playing, asking for your move. The columns are labeled a–k and rows 1–11. The first player, playing X's, is trying to draw a vertical connected path through the board, while the second player, with O's, is drawing a horizontal path.

O O O O X . . . . . . 
 . . . . . . . . . . . 
  . . . . . . . . . . . 
   . . . . X . . . . . . 
    . . . . . X . . . . . 
     . . . . . . . . . . . 
      . . . . X . . . . . . 
       . . . . . . . . . . . 
        . . . X . . . . . . . 
  x      . . . . . . . . . . . 
 o\\      . . . . . . . . . . . 
last move: e1
Your move?

Model training

python train.py --config config/hex11_train_config.yml --rundir runs/train

Model comparison

python compare.py --config config/hex11_eval_config.yml --rundir runs/compare <mode1> <model2> [model3] ...

Model selection

python tune.py

References

Mastering the Game of Go without Human Knowledge
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Azalea

Quick start

Features

Differences to published AlphaZero

Installation

Playing against pretrained model

Model training

Model comparison

Model selection

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Azalea

Quick start

Features

Differences to published AlphaZero

Installation

Playing against pretrained model

Model training

Model comparison

Model selection

References