-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beta-MCTS: a soft hybrid of MCTS and Minimax (wip, not for merge) #963
base: master
Are you sure you want to change the base?
Conversation
Revert "New u"
Update from master
Patch for PR925
…re displayed in --verbose-move-stats, but not used in search
…ingle header file for compiler reasons
Interesting - would you mind giving a little more high-level description of how search would change? I am guessing that this would reduce the extreme focus on one move that we get with MCTS when one move emerges with clear best Q in the first 100k nodes or so, or rather it would reduce it as node count gets big. From playing ICCF games, I know that this property is often ridiculously extreme to the point that it seems worrisome. |
Install Eigen library into the Docker image used by CircleCi (LeelaChessZero#1412)
merge master
merge policy scaling into beta-mcts
merge master into betamcts
I am implementing a promising idea for a MCTS/Minimax hybrid which will basically do a soft transition from standard MCTS to Minimax. Unlike UCT which theoretically converges to Minimax as well (but without any reasonable bound on the nodes spent...), beta-MCTS actually gives (nearly) Minimax evals for nodes with enough visits.
Speed and softness of this transition are controlled by two parameters
--betamcts-trust
andbetamcts-percentile
, standard values are0.1
and0.05
(which is rather soft), allowed values are[0.0, 1000]
and[0.0, 0.5]
. Too high values oftrust
will result in blind spot traps as the NN evals can't be trusted in a minimax way. Lowerpercentile
will in general put a bit more weight on well explored moves with suboptimal evals, higher percentile on moves with near optimal evals. Setting either parameter to0.0
will result in standard MCTS behavior without transition to Minimax.There are 4 levels of
--betamcts-level
use, affecting how deeply beta-MCTS is used during search, though level 3 isn't implemented yet.Please note that the size of
Node
increased from 80 to 88 Bytes, though I have no idea if that is actually relevant :) Also I expect quite a slowdown on random backend because of some additional CPU operations, any numbers on this would be interesting. Also I don't know if there is any interaction with theGetVisitsToReachU()
logic, especially with the not yet implemented level 3.By design the biggest differences in displayed eval are to be expected in biased tactical lines where only one side can deviate. This especially includes perpetual checks in losing positions, where one side has to answer the checks while the other can deviate to a losing position.
I hope that no faulty behavior escaped my testing, but please let me know if you find anything and I'll look into it. Also, this already comes with the updated PR918 (and of course PR925), so feel free to use
--logit-q
and--new-u-enabled
as well.