Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta-MCTS: a soft hybrid of MCTS and Minimax (wip, not for merge) #963

Open
wants to merge 234 commits into
base: master
Choose a base branch
from

Conversation

Naphthalin
Copy link
Contributor

@Naphthalin Naphthalin commented Oct 6, 2019

I am implementing a promising idea for a MCTS/Minimax hybrid which will basically do a soft transition from standard MCTS to Minimax. Unlike UCT which theoretically converges to Minimax as well (but without any reasonable bound on the nodes spent...), beta-MCTS actually gives (nearly) Minimax evals for nodes with enough visits.

Speed and softness of this transition are controlled by two parameters --betamcts-trust and betamcts-percentile, standard values are 0.1 and 0.05 (which is rather soft), allowed values are [0.0, 1000] and [0.0, 0.5]. Too high values of trust will result in blind spot traps as the NN evals can't be trusted in a minimax way. Lower percentile will in general put a bit more weight on well explored moves with suboptimal evals, higher percentile on moves with near optimal evals. Setting either parameter to 0.0 will result in standard MCTS behavior without transition to Minimax.

There are 4 levels of --betamcts-level use, affecting how deeply beta-MCTS is used during search, though level 3 isn't implemented yet.

Please note that the size of Node increased from 80 to 88 Bytes, though I have no idea if that is actually relevant :) Also I expect quite a slowdown on random backend because of some additional CPU operations, any numbers on this would be interesting. Also I don't know if there is any interaction with the GetVisitsToReachU() logic, especially with the not yet implemented level 3.

By design the biggest differences in displayed eval are to be expected in biased tactical lines where only one side can deviate. This especially includes perpetual checks in losing positions, where one side has to answer the checks while the other can deviate to a losing position.

I hope that no faulty behavior escaped my testing, but please let me know if you find anything and I'll look into it. Also, this already comes with the updated PR918 (and of course PR925), so feel free to use --logit-q and --new-u-enabled as well.

@borg323 borg323 added the wip Work in progress label Oct 6, 2019
@jhorthos
Copy link
Contributor

jhorthos commented Oct 7, 2019

Interesting - would you mind giving a little more high-level description of how search would change? I am guessing that this would reduce the extreme focus on one move that we get with MCTS when one move emerges with clear best Q in the first 100k nodes or so, or rather it would reduce it as node count gets big. From playing ICCF games, I know that this property is often ridiculously extreme to the point that it seems worrisome.

@Naphthalin Naphthalin added not for merge Experimental code which is not intended to be merged into the master demo Code/concept demonstration. Implies not for merge, won't be closed without consulting author. and removed wip Work in progress labels Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Code/concept demonstration. Implies not for merge, won't be closed without consulting author. not for merge Experimental code which is not intended to be merged into the master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants