Beta-MCTS: a soft hybrid of MCTS and Minimax (wip, not for merge) #963

Naphthalin · 2019-10-06T16:35:50Z

I am implementing a promising idea for a MCTS/Minimax hybrid which will basically do a soft transition from standard MCTS to Minimax. Unlike UCT which theoretically converges to Minimax as well (but without any reasonable bound on the nodes spent...), beta-MCTS actually gives (nearly) Minimax evals for nodes with enough visits.

Speed and softness of this transition are controlled by two parameters --betamcts-trust and betamcts-percentile, standard values are 0.1 and 0.05 (which is rather soft), allowed values are [0.0, 1000] and [0.0, 0.5]. Too high values of trust will result in blind spot traps as the NN evals can't be trusted in a minimax way. Lower percentile will in general put a bit more weight on well explored moves with suboptimal evals, higher percentile on moves with near optimal evals. Setting either parameter to 0.0 will result in standard MCTS behavior without transition to Minimax.

There are 4 levels of --betamcts-level use, affecting how deeply beta-MCTS is used during search, though level 3 isn't implemented yet.

Please note that the size of Node increased from 80 to 88 Bytes, though I have no idea if that is actually relevant :) Also I expect quite a slowdown on random backend because of some additional CPU operations, any numbers on this would be interesting. Also I don't know if there is any interaction with the GetVisitsToReachU() logic, especially with the not yet implemented level 3.

By design the biggest differences in displayed eval are to be expected in biased tactical lines where only one side can deviate. This especially includes perpetual checks in losing positions, where one side has to answer the checks while the other can deviate to a losing position.

I hope that no faulty behavior escaped my testing, but please let me know if you find anything and I'll look into it. Also, this already comes with the updated PR918 (and of course PR925), so feel free to use --logit-q and --new-u-enabled as well.

New u

Revert "New u"

Update from master

Patch for PR925

…re displayed in --verbose-move-stats, but not used in search

…ingle header file for compiler reasons

…faulty

jhorthos · 2019-10-07T17:37:33Z

Interesting - would you mind giving a little more high-level description of how search would change? I am guessing that this would reduce the extreme focus on one move that we get with MCTS when one move emerges with clear best Q in the first 100k nodes or so, or rather it would reduce it as node count gets big. From playing ICCF games, I know that this property is often ridiculously extreme to the point that it seems worrisome.

Install Eigen library into the Docker image used by CircleCi (LeelaChessZero#1412)

…ta-mcts

merge master

merge policy scaling into beta-mcts

…ta-mcts

merge master into betamcts

kiudee and others added 24 commits August 5, 2019 15:03

Implement equilibrium based U formula

4147a87

Fix calculation of Q+U for verbose stats

888dca7

Merge pull request #1 from kiudee/new_u

ea4d067

New u

Add logit_q switch in GetFpu

47fb35d

Add logit_q argument to GetFpu call

32afc30

Move scaling factor to common function.

7c97df2

Remove scaling factor

c027aa3

Remove scaling factor

0c9c61b

Add logit switch inside *Prefetch into cache*

ba12430

Change order of ternary arguments

b55d45d

Revert "New u"

9f3b4a9

Merge pull request #3 from Naphthalin/revert-1-new_u

59f10c3

Revert "New u"

Merge pull request #2 from LeelaChessZero/master

8fad165

Update from master

Merge pull request #4 from AlexisOlson/patch-7

9649bca

Patch for PR925

resolved conflicts in search.cc

bafd75d

resolved conflicts in search.cc

0787f17

resolved conflicts in search.cc

eba26f2

Bug fixes, compiles now

48d8c29

Implemented beta-mcts update of evals. effective N, Q and relevance a…

55e2886

…re displayed in --verbose-move-stats, but not used in search

external source for betainc and invbetainc functions. hacked into a s…

2544af3

…ingle header file for compiler reasons

implemented parameters for beta-mcts. behavior around terminal nodes …

36ba503

…faulty

better position for relevance calculation

89bed73

terminal nodes are handled correctly now

55033fc

special case pure MCTS + terminal nodes fixed

7998f20

borg323 added the wip Work in progress label Oct 6, 2019

Naphthalin added 3 commits October 6, 2019 23:47

allow relevance to be 0

2e0ff17

allow relevance to be 0 reverted

9ce7e5e

AppVeyor meson==0.51.2

ff3b64f

--betamcts-level=3 implemented. experimental :)

6fd937b

Naphthalin and others added 22 commits August 22, 2020 15:09

move ordering takes betamcts into account

e974f62

changed order of moves in verbosemovestats

c06642a

Merge pull request #20 from LeelaChessZero/master

449df55

Install Eigen library into the Docker image used by CircleCi (LeelaChessZero#1412)

order moves by estimated lcb

2ce1b0e

Merge branch 'beta-mcts' of https://github.com/Naphthalin/lc0 into be…

86c00d0

…ta-mcts

Merge pull request #21 from LeelaChessZero/master

6ee24d6

merge master

Merge branch 'beta-mcts' into april

e70be5d

Merge pull request #22 from Naphthalin/april

fce74d4

merge policy scaling into beta-mcts

changed default params

5851a7d

Merge branch 'beta-mcts' of https://github.com/Naphthalin/lc0 into be…

67eb8e6

…ta-mcts

fixing merge errors

143507f

fixed order of params in node.h::GetU()

9ba2b4f

more merge bug fixes

757b742

more merge bug fixes

f885762

activated cpuct scaling

a939cd9

increased value of percentile for LCB choice to 0.35

0172fb7

increased value of percentile for LCB choice to 0.42

c35c673

updated params

2dc51cd

fixed sign error

a51bd5e

don't to logarithms while tired

e6486b7

made --lcb-percentile configurable

89f5816

avoid division by zero if percentile is 0.0 or 1.0

2252c86

Naphthalin mentioned this pull request Oct 1, 2020

Blunder collection of recent T60 based MLH nets #1403

Closed

Merge pull request #23 from LeelaChessZero/master

deeabd7

merge master into betamcts

Naphthalin mentioned this pull request Oct 29, 2020

Implementing an analysis mode for Lc0 to allow forward/backward analysis without losing the tree. #1455

Closed

Naphthalin mentioned this pull request Mar 8, 2021

Analyse Mode + minimax/mcts hybrid #1543

Open

Naphthalin mentioned this pull request Apr 20, 2022

Extracting parts of Lc0's search into classes would help future development #1734

Open

Naphthalin added not for merge Experimental code which is not intended to be merged into the master demo Code/concept demonstration. Implies not for merge, won't be closed without consulting author. and removed wip Work in progress labels Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beta-MCTS: a soft hybrid of MCTS and Minimax (wip, not for merge) #963

Beta-MCTS: a soft hybrid of MCTS and Minimax (wip, not for merge) #963

Naphthalin commented Oct 6, 2019 •

edited

Loading

jhorthos commented Oct 7, 2019 •

edited

Loading

Beta-MCTS: a soft hybrid of MCTS and Minimax (wip, not for merge) #963

Are you sure you want to change the base?

Beta-MCTS: a soft hybrid of MCTS and Minimax (wip, not for merge) #963

Conversation

Naphthalin commented Oct 6, 2019 • edited Loading

jhorthos commented Oct 7, 2019 • edited Loading

Naphthalin commented Oct 6, 2019 •

edited

Loading

jhorthos commented Oct 7, 2019 •

edited

Loading