Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 2.69 KB

README.md

File metadata and controls

33 lines (24 loc) · 2.69 KB

AlphaUT3

AlphaZero for ultimate tic-tac-toe.

What is ultimate tic-tac-toe?

It's like tic-tac-toe, but each square of the game contains another game of tic-tac-toe in it! Win small games to claim the squares in the big game. Simple, right? But there is a catch: Whichever small square you pick is the next big square your opponent must play in. Read more...

ultimate tic-tac-toe gif

What is AlphaZero?

AlphaZero is a reinforcement learning algorithm trained only using self-play. It combines a neural network and Monte Carlo Tree Search in an elegant policy iteration framework to achieve stable learning. Read more...

Experiments

For testing purposes, consider a simple opponent: MinMax(n). Before each move, MinMax(n) expands the search tree from the current state of the board, simulating every possible move and returning the value v of the resulting state, up to a depth of n moves. v is either 1, -1, or , depending on whether the game is won, lost or drawn respectively. v is zero for non-terminal states. MinMax(n) applies the MinMax algorithm so that guaranteed wins are played and guaranteed losses avoided by up to n moves ahead. When no winning move is detected, a move is chosen randomly from the non-losing moves. MinMax(0) is therefore a completely random player.

Experiments coming soon.

To-do

Requirements

Thanks