Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 1.95 KB

Monte-Carlo-Tree-Search.md

File metadata and controls

36 lines (24 loc) · 1.95 KB

MONTE-CARLO-TREE-SEARCH

AIMA4e

function MONTE-CARLO-TREE-SEARCH(state) returns an action
 tree ← NODE(state)  while TIME-REMAINING() do
   tree ← PLAYOUT(tree)
return the move in ACTIONS(state) with highest Q(state,move)


function PLAYOUT(tree) returns updated tree
nodetree
while node is not terminal and was already in tree do
   move ← SELECT(node)
   node ← FOLLOW-LINK(node,move)
outcome ← SIMULATION(node.STATE)
 UPDATE(node,outcome)
return tree


function SELECT(node) returns an action
return argmaxm ∈ FEASIBLE-ACTIONS(node) UCB(RESULT(node,m))


function UCB(child) returns a number
return child.VALUE + C ×


FIGURE ?? The Monte Carlo tree search algorithm. A game tree, tree, is initialized, and then grows by one node with each call to PLAYOUT. The function SELECT chooses a move that best balances exploitation and exploration according to the UCB formula. FOLLOW-LINK traverses from the current node by making a move; this could be to a previously-seen node, or to a new node that is added to the tree. Once we have added a new node, we exit the while loop and SIMULATION chooses moves (with a randomized policy that is designed to favor good moves but to compute quickly) until the game is over. Then, UPDATE updates all the nodes in the tree from node to the root, recording the fact that the path led to the final outcome.