-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement root move temperature #267
Conversation
Update to glinscott/next
Introduces the option to choose root moves by exponentiated visit count, to maintain randomness and games with much less of a strength cost than move selection proportional to visit count. Also implements a command line parameter --tempdecay (-d) which takes a decay constant as parameter to dynamically reduce temperature throughout the game (logarithmic decay).
Implement root move temperature
This is still work in progress and has not yet been properly strength-tested. I will start to do that tomorrow, but in case other people like to get a head start I'm opening the pull request. So far I only tested with (currently enhanced) debug output that the feature works as intended for a few ply in console with "go" commands. For the decay constant, something like 25 seems a reasonable value to me, but I'm happy if others weigh in there as well. I'd like to ask @killerducky and/or @Uriopass for a code review if possible. |
A reintroduced bug in UCTNode.cpp breaks the continuous integration at the moment.
Fix revert in UCTNode.cpp
Strangely, there is still a problem with continuous integration, even after I fixed the accidentally reintroduced auto move = pos.get_move(); in UCTNode.cpp. Anyone have ideas what could be the problem? |
I assume you mean this error: Looks like you forgot to edit the header file here: |
Why not just start the games from the matches with two random moves (4 ply)? If games are replayed with reversed colors this is fair and it seems sufficiently "zero", given that lczero needs to learn to play good chess in any position, not just the starting position. Note: to get accurate error bars one needs to use the pentanomial model for the outcome of game pairs. See the section on "statistical analysis" here |
Changes the declaration of randomises_first_proportionally in UCTNode.h to pass a temperature parameter.
Fix missing header file
Thanks @cn4750! Yes, I forgot to upload the change of the declaration of randomize_first_proportionally. Looks like the build is now passing... |
Fixes tab indentations, adds several comments, increases the denominator in the root temperature calculation to allow slower decay schedules, and adds a floor value of 0.1 for root temperature.
Various fixes
I just uploaded several fixes in the PR, including indentations, comments, a floor value for root temperature and an increase in the denominator for the root temperature calculation. This changes my proposed initial value for the decay schedule I proposed for match games, from 5 to 25. Much smaller decay values may now actually be useable for self-play, but that should be carefully considered and tested before changing the current t=1 for all moves. |
Can you show a few positions and the % each move will be played in them? So we can get an idea what the spread looks like. |
One example for t=0.48 from the (now removed) detailed debug output in the initial PR. This position is after 1. e4 Nc6 2. d4 e6:
In this particular example, there are two candidate moves with almost equal visit counts. Those two still get nearly equal selection chances at this root temperature, while the other candidate moves are already strongly suppressed (only ~1.5 % chance for all of them together, while with t=1 it would be 22.6 %). I didn't create detailed debug output specifically for the selection chances, but you can see them by comparing exponentiated visits counts. |
I don't know the position, but it looks like this is moving the Bishop and letting it get captured for free. I'd like a solution that sets the probability of this being selected to zero. I'd like to collect several positions, analyze them, and set a floor. Again I don't know the position here, but it seems like setting the floor to include maybe Be2 but not c3 seems reasonable. Edit: A floor can be implemented by saying pick only top X% after doing the exponent on visits. And/or exclude moves where the V% is X% below the top move. |
I included the move history now. You're right, Bg5 lets the bishop be captured by the black queen, and the value head is fully aware of that fact. However, I think a 3 in a million chance for this to actually happen is acceptable. Eventually, with stronger nets Bg5 would not get a visit anymore in this position since the policy prior will continue to drop. The tournament I'm currently running shows that with -d 25, this branch is actually stronger than current Lc0 with Dirichlet noise:
Later in the game (not that much later actually), 1-visit moves are completely suppressed by the single precision floating point accuracy. I think a hard lower cap in addition is unnecessary at this point. |
With noise on it will pretty much always visit every root move at least once. What are the arguments against adding a floor? It should be very easy and I don't see any downside. 1 in a million is extreme but there will be 1 in a thousands too that are pretty bad as well. We have over 1 thousand people using this, and it's gonna play those moves a lot and we're going to get support questions about them. I think the goal should be to get a solution that's worthy of putting into a real tournament like TCEC. So it should be picking only moves deemed acceptable alternatives by the search. Mathematical definition of acceptable is what this PR should be looking for. Edit: |
There's no point to use noise in evaluation games anymore with fractional temperature. I'd kind of like a solution that equates current t=1 for decay constant 0, which is not helped by setting a floor. I suppose putting in a floor where the exponentiated visit count is set to zero for any move where it's less than 1/1000 of the PV should work somewhat like you describe, this should set the selection probability for 1-visit blunders to zero already for t~0.5 (which it already reaches by ply 4 when using decay constant 25). For t=1, it would remove 1-visit nodes from selection as long as the total visit count is larger than 1000. Would this impact self-play games negatively in any way? I doubt it personally but we should discuss... Edit: I'll put in the floor of 1/1000 if a few other people weigh in on this. @glinscott, @Error323, @Uriopass, any opinions? |
I think for now we should not impact self-play at all. No noise, I guess that can work. It seems this information is important enough to integrate into the main dump_stats output. We should be able to look at logs of games to determine which moves it is picking and why. How about after printing the unaltered V count, print something like:
Then users can see what % each move will be picked without doing math. |
OK I'm going to implement this, good idea. Currently, root output looks like this:
So should I put in an additional column, e.g.
where S shows the root move selection probabilities? Or directly after visit count, like
In this case, I might have to either include 5 decimal places so the fluke moves aren't showing as 0.00%, or actually implement some sort of floor after all... |
@killerducky I thought about how to implement your suggestion to include the move probabilities in the main dump_stats. Unfortunately, I think we need some refactoring of both the root_temperature calculation in UCTSearch::get_best_move() and the exponentiated visit count calculation in UCTNode::randomize_first_proportionally. Currently, both calculations are performed locally. Do you have some suggestions exactly how to get those stats into dump_stats without duplicating both calculations? Edit: I guess what would work is to define new functions UCTNode::get_expvisits(tau) and UCTSearch::get_root_temp(). I'll put this refactoring in later (unless you'd like to do it first 😀 ) |
@killerducky It's important to keep in mind that the moves played in self-play are not the ones the policy trains toward, so it's important to keep them in the training data because they lead to a variety of positions which are encountered by the MCTS in a normal search. As it's a local policy improvement operator, we need to include all positions in some neighborhood of the target population, or else the network might forget how to evaluate bad positions. Anyway, in this case it will make negligible difference (3 in a million) but IMO setting a "threshold" cannot be beneficial and may even hurt. |
@Akababa this mode is only intended for matches, not self-play. For the reason you just mentioned. |
@jkiliani are you talking about the threshold mode? Wouldn't matches be played with tau=0 (pick max)? |
@Akababa At the moment, matches are played with tau=0, but that necessitates using Dirichlet noise to avoid deterministic play. The whole point behind this pull request is to replace Dirichlet noise in match games and probably play against humans with fractional temperature, to get more variability in positions ideally without a loss in strength. So far it looks like this is succeeding with a temperature decay constant of 25: The performance of the engine with decaying temperature is almost equal to that with Dirichlet noise, both against each other and against Stockfish Level 10. |
src/Parameters.cpp
Outdated
@@ -60,6 +61,7 @@ bool cfg_tune_only; | |||
float cfg_puct; | |||
float cfg_softmax_temp; | |||
float cfg_fpu_reduction; | |||
float cfg_root_temp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't change this value anywhere? So could just inline the 1.0 in the one place it's used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could I suppose, it's a question of whether we want constants all collected in the Parameter file, or just where we use them.
I was actually considering pulling the Dirichlet noise constants alpha and epsilon into Parameters.cpp, but is your general preference towards just having multiple-use constants in the Parameter file?
I thought about it, changing cfg_root_temp would actually be rather impractical. Might be better to just get rid of it.
Just finished the first strength testing of this pull request. I matched Id 103 with current /next branch and Dirichlet noise against this branch without noise but a temperature decay schedule with -d 25, and Stockfish Level 10 as an outside source of comparison. Lc0 was given 800 visits each.
I'd call this a very successful first test, since at least with this temperature decay schedule no strength loss compared to using Dirichlet noise for randomness is measurable. The opening variety is much better on the other hand. As an aside, Id 103 seems decisively better than Stockfish Lv 10. I'm now running a second strength test with Id 107, this time matching multiple decay schedules against each other. About the code, I removed cfg_root_temp and will try to implement @killerducky's suggestion which requires a refactoring of the root temperature calculation. When the addition to dump_stats is in and the second test is finished, I would consider this ready. |
Removes cfg_root_temp from Parameters.cpp as configurable parameter. Factors out get_root_temperature as a function. Includes the probability to play each root move in UCTSearch::dump_stats, if temperature is used.
Remove cfg_root_temp, enhance dump_stats
I now put in the root move probabilities while temperature is used. I tried to get it generalised to also handle the case with no temperature set (i.e. show 100 % for first root node and 0 % for all others) but didn't manage that part. If someone else (@killerducky?) would like to optimise this, please feel free. |
Awesome, thanks so much! |
Current standing in my tournament with Id 107:
The tournament is still running and will probably finish this evening sometime, but I think I can draw some conclusions already from the data so far: Temperature decay works robustly for multiple decay schedules, and the strength loss from using a slower decay schedules than my original test with -d 25 is surprisingly small. Even -d 5 still appears to be a realistic choice. --tempdecay offers a tuneable value for how much playing strength is traded off for variety of play, which should prove popular both for matches and people running bots against other engines and against humans. |
The tournament for Id 107 is finished now:
In the end, there was a small strength cost to using temperature decay, even with d=25. The performance of d=5 is strange and probably simply due to still rather weak statistics, but it seems safe to say that d=5 is still a valid option. I'm going to do one more match of this format, replacing d=10 with d=50 to look at the performance of faster temperature decay. |
And finally, the tournament for Id 112:
I think this proves that with a quick enough temperature decay schedule, a loss in strength can be entirely avoided, since the strength loss is compensated by not using Dirichlet noise instead. I would still recommend a somewhat slower decay schedule than -d 50 for match games to get more opening variety. How about -d 10? With these results, I would consider the temperature decay implementation sufficiently tested now to be used. Outside of "official" uses for matches, the choice of the decay constant is up to the user. I did not test what the current strength cost of Dirichlet noise is compared to no randomness at all, but picking a really large decay constant will likely closely approximate that already. |
I think we should start small and see how big of a problem duplicate games is, and gradually increase until no longer a problem. It should be pretty easy for Gary to try say d25 at the start, and then go to d15 a day later, and if still necessary go to d10 for example, but I think we don't really need much variety with only 400-500 games. SF uses a 4 ply book for even 50,000 games just fine, without extra noise or even multi threaded variability normally. |
Implements root move selection by exponentiated visit count, to maintain randomness in games without as much strength loss as caused by root move temperature = 1 for the whole game. Also introduces a logarithmic decay schedule which is initialised by the new command line parameter --tempdecay (-d), which reduces the root move temperature throughout the game. The decay constant is customisable.