Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement root move temperature #267

Merged
merged 12 commits into from
Apr 9, 2018
Merged

Conversation

jkiliani
Copy link
Contributor

@jkiliani jkiliani commented Apr 8, 2018

Implements root move selection by exponentiated visit count, to maintain randomness in games without as much strength loss as caused by root move temperature = 1 for the whole game. Also introduces a logarithmic decay schedule which is initialised by the new command line parameter --tempdecay (-d), which reduces the root move temperature throughout the game. The decay constant is customisable.

jkiliani added 4 commits April 6, 2018 15:10
Update to glinscott/next
Introduces the option to choose root moves by exponentiated visit count, to maintain randomness and games with much less of a strength cost than move selection proportional to visit count. Also implements a command line parameter --tempdecay (-d) which takes a decay constant as parameter to dynamically reduce temperature throughout the game (logarithmic decay).
Implement root move temperature
@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

This is still work in progress and has not yet been properly strength-tested. I will start to do that tomorrow, but in case other people like to get a head start I'm opening the pull request. So far I only tested with (currently enhanced) debug output that the feature works as intended for a few ply in console with "go" commands. For the decay constant, something like 25 seems a reasonable value to me, but I'm happy if others weigh in there as well.

I'd like to ask @killerducky and/or @Uriopass for a code review if possible.

jkiliani added 2 commits April 8, 2018 02:34
A reintroduced bug in UCTNode.cpp breaks the continuous integration at the moment.
@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

Strangely, there is still a problem with continuous integration, even after I fixed the accidentally reintroduced auto move = pos.get_move(); in UCTNode.cpp. Anyone have ideas what could be the problem?

@jkiliani jkiliani changed the title Implement root move temperature Implement root move temperature (WIP) Apr 8, 2018
@jkiliani jkiliani mentioned this pull request Apr 8, 2018
@cn4750
Copy link

cn4750 commented Apr 8, 2018

I assume you mean this error:
/src/src/UCTNode.cpp:171:6: error: prototype for 'void UCTNode::randomize_first_proportionally(float)' does not match any in class 'UCTNode' void UCTNode::randomize_first_proportionally(float tau) { ^ In file included from /src/src/UCTNode.cpp:40:0: /src/src/UCTNode.h:63:10: error: candidate is: void UCTNode::randomize_first_proportionally() void randomize_first_proportionally(); ^ CMakeFiles/objs.dir/build.make:158: recipe for target 'CMakeFiles/objs.dir/src/UCTNode.cpp.o' failed make[3]: *** [CMakeFiles/objs.dir/src/UCTNode.cpp.o] Error 1

Looks like you forgot to edit the header file here:
https://github.com/glinscott/leela-chess/blob/master/src/UCTNode.h#L61

@vdbergh
Copy link

vdbergh commented Apr 8, 2018

Why not just start the games from the matches with two random moves (4 ply)? If games are replayed with reversed colors this is fair and it seems sufficiently "zero", given that lczero needs to learn to play good chess in any position, not just the starting position.

Note: to get accurate error bars one needs to use the pentanomial model for the outcome of game pairs. See the section on "statistical analysis" here

https://chessprogramming.wikispaces.com/Match+Statistics

jkiliani added 2 commits April 8, 2018 08:11
Changes the declaration of randomises_first_proportionally in UCTNode.h to pass a temperature parameter.
@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

Thanks @cn4750! Yes, I forgot to upload the change of the declaration of randomize_first_proportionally. Looks like the build is now passing...

@jkiliani jkiliani changed the title Implement root move temperature (WIP) [WIP] Implement root move temperature Apr 8, 2018
@jkiliani jkiliani mentioned this pull request Apr 8, 2018
jkiliani added 2 commits April 8, 2018 12:17
Fixes tab indentations, adds several comments, increases the denominator in the root temperature calculation to allow slower decay schedules, and adds a floor value of 0.1 for root temperature.
@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

I just uploaded several fixes in the PR, including indentations, comments, a floor value for root temperature and an increase in the denominator for the root temperature calculation.

This changes my proposed initial value for the decay schedule I proposed for match games, from 5 to 25. Much smaller decay values may now actually be useable for self-play, but that should be carefully considered and tested before changing the current t=1 for all moves.

@killerducky
Copy link
Collaborator

Can you show a few positions and the % each move will be played in them? So we can get an idea what the spread looks like.

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

One example for t=0.48 from the (now removed) detailed debug output in the initial PR. This position is after 1. e4 Nc6 2. d4 e6:

info string  Bh6 ->       0 (V: 52.52%) (N:  0.09%) PV: Bh6 
info string  Ba6 ->       0 (V: 52.52%) (N:  0.11%) PV: Ba6 
info string  Kd2 ->       0 (V: 52.52%) (N:  0.14%) PV: Kd2 
info string  Qh5 ->       0 (V: 52.52%) (N:  0.16%) PV: Qh5 
info string  Qf3 ->       0 (V: 52.52%) (N:  0.17%) PV: Qf3 
info string  Ke2 ->       0 (V: 52.52%) (N:  0.18%) PV: Ke2 
info string  Bg5 ->       1 (V: 43.16%) (N:  0.31%) PV: Bg5 Qxg5
info string  Qg4 ->       1 (V: 43.98%) (N:  0.19%) PV: Qg4 d5
info string   b4 ->       1 (V: 47.76%) (N:  0.24%) PV: b4 Bxb4+
info string  Bd2 ->       1 (V: 48.45%) (N:  0.42%) PV: Bd2 Nxd4
info string  Qd2 ->       1 (V: 48.67%) (N:  0.33%) PV: Qd2 d5
info string   f3 ->       1 (V: 49.04%) (N:  0.26%) PV: f3 d5
info string  Qe2 ->       1 (V: 49.29%) (N:  0.43%) PV: Qe2 Nxd4
info string  Na3 ->       1 (V: 49.84%) (N:  0.48%) PV: Na3 d5
info string  Bc4 ->       1 (V: 49.85%) (N:  0.35%) PV: Bc4 d5
info string  Qd3 ->       1 (V: 50.46%) (N:  0.24%) PV: Qd3 d5
info string  Nh3 ->       1 (V: 50.66%) (N:  0.44%) PV: Nh3 d5
info string   g4 ->       1 (V: 51.68%) (N:  0.29%) PV: g4 d5
info string   b3 ->       2 (V: 47.63%) (N:  0.33%) PV: b3 d5 e5
info string  Nd2 ->       2 (V: 48.00%) (N:  0.69%) PV: Nd2 Nxd4 c3
info string  Be3 ->       2 (V: 49.74%) (N:  0.77%) PV: Be3 d5 Nc3
info string   f4 ->       2 (V: 50.24%) (N:  0.47%) PV: f4 d5 e5
info string  Bf4 ->       2 (V: 50.34%) (N:  0.62%) PV: Bf4 d5 Nc3
info string   a4 ->       2 (V: 50.50%) (N:  0.70%) PV: a4 d5 e5
info string  Bd3 ->       3 (V: 49.15%) (N:  0.98%) PV: Bd3 Nxd4 Nf3 Nxf3+
info string   h4 ->       3 (V: 52.04%) (N:  0.62%) PV: h4 d5 e5 h6
info string   e5 ->       4 (V: 48.80%) (N:  1.01%) PV: e5 d6 Nf3 dxe5 Nxe5
info string  Bb5 ->       5 (V: 50.36%) (N:  1.21%) PV: Bb5 d5 exd5 exd5 Nf3
info string   a3 ->       5 (V: 51.51%) (N:  1.05%) PV: a3 d5 e5 f6 Nf3 fxe5
info string  Ne2 ->       6 (V: 50.61%) (N:  1.32%) PV: Ne2 d5 Nbc3 Nf6 e5 Nd7 g3
info string   g3 ->       7 (V: 51.14%) (N:  1.53%) PV: g3 d5 e5 f6 Nf3 fxe5 dxe5
info string   h3 ->       7 (V: 51.42%) (N:  1.30%) PV: h3 d5 e5 f6 Nf3 fxe5
info string   c3 ->       9 (V: 50.57%) (N:  2.06%) PV: c3 d5 e5 f6 Nf3 fxe5 Nxe5 Nxe5
info string  Be2 ->      20 (V: 50.92%) (N:  4.07%) PV: Be2 d5 e5 f6 Nf3 fxe5 dxe5 Nge7 c4
info string  Nc3 ->      23 (V: 51.26%) (N:  4.49%) PV: Nc3 d5 Nf3 Nf6 e5 Ne4 Bd3 Nxc3
info string   c4 ->      46 (V: 52.71%) (N:  5.61%) PV: c4 Bb4+ Nc3 Nf6 e5 Ne4 Qc2 d5 Nf3 Nxc3
info string   d5 ->     300 (V: 51.86%) (N: 36.63%) PV: d5 exd5 exd5 Ne5 Be2 Nf6 Nf3 Nxf3+ Bxf3 Bd6 O-O O-O Nc3 Re8
info string  Nf3 ->     319 (V: 52.63%) (N: 29.72%) PV: Nf3 d5 Nc3 Nf6 e5 Ne4 Bd3 Bb4 Bd2 Nxd2 Qxd2 Be7

info depth 17 nodes 782 nps 294 score cp 7 winrate 52.02% time 1664 pv Nf3 d5 Nc3 Nf6 e5 Ne4 Bd3 Bb4 Bd2 Nxd2 Qxd2 Be7
Game ply: 4, root temperature:  0.48 
Visits: 319 Exponentiated visits: 179674.30 Cumulative visits: 179674.30
Visits: 300 Exponentiated visits: 157949.09 Cumulative visits: 337623.41
Visits: 46 Exponentiated visits:  3086.63 Cumulative visits: 340710.03
Visits: 23 Exponentiated visits:   720.67 Cumulative visits: 341430.72
Visits: 20 Exponentiated visits:   537.47 Cumulative visits: 341968.19
Visits: 9 Exponentiated visits:   100.60 Cumulative visits: 342068.78
Visits: 7 Exponentiated visits:    59.37 Cumulative visits: 342128.16
Visits: 7 Exponentiated visits:    59.37 Cumulative visits: 342187.53
Visits: 6 Exponentiated visits:    42.96 Cumulative visits: 342230.50
Visits: 5 Exponentiated visits:    29.30 Cumulative visits: 342259.81
Visits: 5 Exponentiated visits:    29.30 Cumulative visits: 342289.12
Visits: 4 Exponentiated visits:    18.34 Cumulative visits: 342307.47
Visits: 3 Exponentiated visits:    10.03 Cumulative visits: 342317.50
Visits: 3 Exponentiated visits:    10.03 Cumulative visits: 342327.53
Visits: 2 Exponentiated visits:     4.28 Cumulative visits: 342331.81
Visits: 2 Exponentiated visits:     4.28 Cumulative visits: 342336.09
Visits: 2 Exponentiated visits:     4.28 Cumulative visits: 342340.38
Visits: 2 Exponentiated visits:     4.28 Cumulative visits: 342344.66
Visits: 2 Exponentiated visits:     4.28 Cumulative visits: 342348.94
Visits: 2 Exponentiated visits:     4.28 Cumulative visits: 342353.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342354.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342355.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342356.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342357.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342358.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342359.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342360.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342361.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342362.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342363.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342364.22
Visits: 1 Exponentiated visits:     1.00 Cumulative visits: 342365.22
Visits: 0 Exponentiated visits:     0.00 Cumulative visits: 342365.22
Visits: 0 Exponentiated visits:     0.00 Cumulative visits: 342365.22
Visits: 0 Exponentiated visits:     0.00 Cumulative visits: 342365.22
Visits: 0 Exponentiated visits:     0.00 Cumulative visits: 342365.22
Visits: 0 Exponentiated visits:     0.00 Cumulative visits: 342365.22
Visits: 0 Exponentiated visits:     0.00 Cumulative visits: 342365.22
pick, pick_scaled: 1494472530, 238258.12
bestmove d4d5

In this particular example, there are two candidate moves with almost equal visit counts. Those two still get nearly equal selection chances at this root temperature, while the other candidate moves are already strongly suppressed (only ~1.5 % chance for all of them together, while with t=1 it would be 22.6 %).

I didn't create detailed debug output specifically for the selection chances, but you can see them by comparing exponentiated visits counts.

@killerducky
Copy link
Collaborator

killerducky commented Apr 8, 2018

I don't know the position, but it looks like this is moving the Bishop and letting it get captured for free. I'd like a solution that sets the probability of this being selected to zero.
info string Bg5 -> 1 (V: 43.16%) (N: 0.31%) PV: Bg5 Qxg5

I'd like to collect several positions, analyze them, and set a floor. Again I don't know the position here, but it seems like setting the floor to include maybe Be2 but not c3 seems reasonable.

Edit: A floor can be implemented by saying pick only top X% after doing the exponent on visits. And/or exclude moves where the V% is X% below the top move.

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

I included the move history now. You're right, Bg5 lets the bishop be captured by the black queen, and the value head is fully aware of that fact. However, I think a 3 in a million chance for this to actually happen is acceptable. Eventually, with stronger nets Bg5 would not get a visit anymore in this position since the policy prior will continue to drop.

The tournament I'm currently running shows that with -d 25, this branch is actually stronger than current Lc0 with Dirichlet noise:

lc_id103 vs sf_lv10: 20 - 10 - 7
lc_id103d vs sf_lv10: 24 - 5 - 9
lc_id103d vs lc_id103: 13 - 9 - 15

Later in the game (not that much later actually), 1-visit moves are completely suppressed by the single precision floating point accuracy. I think a hard lower cap in addition is unnecessary at this point.

@killerducky
Copy link
Collaborator

killerducky commented Apr 8, 2018

With noise on it will pretty much always visit every root move at least once. What are the arguments against adding a floor? It should be very easy and I don't see any downside. 1 in a million is extreme but there will be 1 in a thousands too that are pretty bad as well. We have over 1 thousand people using this, and it's gonna play those moves a lot and we're going to get support questions about them.

I think the goal should be to get a solution that's worthy of putting into a real tournament like TCEC. So it should be picking only moves deemed acceptable alternatives by the search. Mathematical definition of acceptable is what this PR should be looking for.

Edit:
auto pick_scaled = pick*accum/int_limit; looks like you can just divide this by X.

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

There's no point to use noise in evaluation games anymore with fractional temperature. I'd kind of like a solution that equates current t=1 for decay constant 0, which is not helped by setting a floor.

I suppose putting in a floor where the exponentiated visit count is set to zero for any move where it's less than 1/1000 of the PV should work somewhat like you describe, this should set the selection probability for 1-visit blunders to zero already for t~0.5 (which it already reaches by ply 4 when using decay constant 25).

For t=1, it would remove 1-visit nodes from selection as long as the total visit count is larger than 1000. Would this impact self-play games negatively in any way? I doubt it personally but we should discuss...

Edit: I'll put in the floor of 1/1000 if a few other people weigh in on this. @glinscott, @Error323, @Uriopass, any opinions?

@killerducky
Copy link
Collaborator

I think for now we should not impact self-play at all. No noise, I guess that can work.

It seems this information is important enough to integrate into the main dump_stats output. We should be able to look at logs of games to determine which moves it is picking and why. How about after printing the unaltered V count, print something like:

std::pow(child->get_visits(),1/tau) / accum * 100

Then users can see what % each move will be picked without doing math.

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

OK I'm going to implement this, good idea. Currently, root output looks like this:

info string Nxd5 ->      59 (V: 49.98%) (N:  6.72%) PV: Nxd5 Nxe4 Ne3 Bc5 d3 Nf6 Nf3 Nc6 b4 Bb6 Bb2 O-O Be2
info string exd5 ->     383 (V: 48.69%) (N: 78.82%) PV: exd5 Nxd5 Nf3 Nxc3 bxc3 e4 Nd4 c5 Nb5 a6 d3 axb5 dxe4 Qxd1+

So should I put in an additional column, e.g.

info string Nxd5 ->      59 (V: 49.98%) (N:  6.72%) (S:  2.32%) PV: Nxd5 Nxe4 Ne3 Bc5 d3 Nf6 Nf3 Nc6 b4 Bb6 Bb2 O-O Be2
info string exd5 ->     383 (V: 48.69%) (N: 78.82%) (S: 97.68%) PV: exd5 Nxd5 Nf3 Nxc3 bxc3 e4 Nd4 c5 Nb5 a6 d3 axb5 dxe4 Qxd1+

where S shows the root move selection probabilities?

Or directly after visit count, like

info string Nxd5 ->      59 ( 2.32%)   (V: 49.98%) (N:  6.72%)  PV: Nxd5 Nxe4 Ne3 Bc5 d3 Nf6 Nf3 Nc6 b4 Bb6 Bb2 O-O Be2
info string exd5 ->     383 (97.68%)   (V: 48.69%) (N: 78.82%)  PV: exd5 Nxd5 Nf3 Nxc3 bxc3 e4 Nd4 c5 Nb5 a6 d3 axb5 dxe4 Qxd1+

In this case, I might have to either include 5 decimal places so the fluke moves aren't showing as 0.00%, or actually implement some sort of floor after all...

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

@killerducky I thought about how to implement your suggestion to include the move probabilities in the main dump_stats. Unfortunately, I think we need some refactoring of both the root_temperature calculation in UCTSearch::get_best_move() and the exponentiated visit count calculation in UCTNode::randomize_first_proportionally. Currently, both calculations are performed locally. Do you have some suggestions exactly how to get those stats into dump_stats without duplicating both calculations?

Edit: I guess what would work is to define new functions UCTNode::get_expvisits(tau) and UCTSearch::get_root_temp(). I'll put this refactoring in later (unless you'd like to do it first 😀 )

@Akababa
Copy link
Contributor

Akababa commented Apr 8, 2018

@killerducky It's important to keep in mind that the moves played in self-play are not the ones the policy trains toward, so it's important to keep them in the training data because they lead to a variety of positions which are encountered by the MCTS in a normal search. As it's a local policy improvement operator, we need to include all positions in some neighborhood of the target population, or else the network might forget how to evaluate bad positions.

Anyway, in this case it will make negligible difference (3 in a million) but IMO setting a "threshold" cannot be beneficial and may even hurt.

@killerducky
Copy link
Collaborator

@Akababa this mode is only intended for matches, not self-play. For the reason you just mentioned.

@Akababa
Copy link
Contributor

Akababa commented Apr 8, 2018

@jkiliani are you talking about the threshold mode? Wouldn't matches be played with tau=0 (pick max)?

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

@Akababa At the moment, matches are played with tau=0, but that necessitates using Dirichlet noise to avoid deterministic play. The whole point behind this pull request is to replace Dirichlet noise in match games and probably play against humans with fractional temperature, to get more variability in positions ideally without a loss in strength. So far it looks like this is succeeding with a temperature decay constant of 25: The performance of the engine with decaying temperature is almost equal to that with Dirichlet noise, both against each other and against Stockfish Level 10.

@@ -60,6 +61,7 @@ bool cfg_tune_only;
float cfg_puct;
float cfg_softmax_temp;
float cfg_fpu_reduction;
float cfg_root_temp;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't change this value anywhere? So could just inline the 1.0 in the one place it's used.

Copy link
Contributor Author

@jkiliani jkiliani Apr 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could I suppose, it's a question of whether we want constants all collected in the Parameter file, or just where we use them.

I was actually considering pulling the Dirichlet noise constants alpha and epsilon into Parameters.cpp, but is your general preference towards just having multiple-use constants in the Parameter file?

I thought about it, changing cfg_root_temp would actually be rather impractical. Might be better to just get rid of it.

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

Just finished the first strength testing of this pull request. I matched Id 103 with current /next branch and Dirichlet noise against this branch without noise but a temperature decay schedule with -d 25, and Stockfish Level 10 as an outside source of comparison. Lc0 was given 800 visits each.

Rank Name                          Elo     +/-   Games   Score   Draws
   1 lc_id103d                      67      42     200   59.5%   25.0%
   2 lc_id103                       65      43     200   59.3%   23.5%
   3 sf_lv10                      -137      47     200   31.3%   16.5%

I'd call this a very successful first test, since at least with this temperature decay schedule no strength loss compared to using Dirichlet noise for randomness is measurable. The opening variety is much better on the other hand. As an aside, Id 103 seems decisively better than Stockfish Lv 10.

I'm now running a second strength test with Id 107, this time matching multiple decay schedules against each other.

About the code, I removed cfg_root_temp and will try to implement @killerducky's suggestion which requires a refactoring of the root temperature calculation. When the addition to dump_stats is in and the second test is finished, I would consider this ready.

jkiliani added 2 commits April 9, 2018 01:43
Removes cfg_root_temp from Parameters.cpp as configurable parameter. Factors out get_root_temperature as a function. Includes the probability to play each root move in UCTSearch::dump_stats, if temperature is used.
Remove cfg_root_temp, enhance dump_stats
@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 8, 2018

I now put in the root move probabilities while temperature is used. I tried to get it generalised to also handle the case with no temperature set (i.e. show 100 % for first root node and 0 % for all others) but didn't manage that part. If someone else (@killerducky?) would like to optimise this, please feel free.

@jkiliani jkiliani changed the title [WIP] Implement root move temperature Implement root move temperature Apr 8, 2018
@glinscott glinscott merged commit bca43df into glinscott:next Apr 9, 2018
@glinscott
Copy link
Owner

Awesome, thanks so much!

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 9, 2018

Current standing in my tournament with Id 107:

lc_id107 results:     46 - 38 - 39 (+23 Elo)
lc_id107_d25 results: 48 - 30 - 44 (+52 Elo)
lc_id107_d15 results: 46 - 43 - 33 (+9 Elo)
lc_id107_d10 results: 43 - 43 - 37 (0 Elo)
lc_id107_d5 results:  43 - 45 - 34 (-6 Elo)
sf_lv10 results:      33 - 60 - 29 (-78 Elo)

The tournament is still running and will probably finish this evening sometime, but I think I can draw some conclusions already from the data so far: Temperature decay works robustly for multiple decay schedules, and the strength loss from using a slower decay schedules than my original test with -d 25 is surprisingly small. Even -d 5 still appears to be a realistic choice. --tempdecay offers a tuneable value for how much playing strength is traded off for variety of play, which should prove popular both for matches and people running bots against other engines and against humans.

@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 9, 2018

The tournament for Id 107 is finished now:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 lc_id107                       38      41     200   55.5%   29.0%
   2 lc_id107_d25                   24      38     200   53.5%   38.0%
   3 lc_id107_d5                    12      40     200   51.7%   30.5%
   4 lc_id107_d15                    0      41     200   50.0%   29.0%
   5 lc_id107_d10                  -19      40     200   47.3%   32.5%
   6 sf_lv10                       -56      43     200   42.0%   22.0%

In the end, there was a small strength cost to using temperature decay, even with d=25. The performance of d=5 is strange and probably simply due to still rather weak statistics, but it seems safe to say that d=5 is still a valid option.

I'm going to do one more match of this format, replacing d=10 with d=50 to look at the performance of faster temperature decay.

@jkiliani jkiliani restored the jkiliani-patch-2 branch April 10, 2018 12:42
@jkiliani
Copy link
Contributor Author

And finally, the tournament for Id 112:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 lc_id112_d50                   53      40     200   57.5%   33.0%
   2 lc_id112                       53      40     200   57.5%   31.0%
   3 lc_id112_d25                   40      41     200   55.8%   29.5%
   4 lc_id112_d10                  -12      40     200   48.3%   30.5%
   5 lc_id112_d5                   -42      40     200   44.0%   31.0%
   6 sf_lv10                       -92      45     200   37.0%   17.0%

I think this proves that with a quick enough temperature decay schedule, a loss in strength can be entirely avoided, since the strength loss is compensated by not using Dirichlet noise instead. I would still recommend a somewhat slower decay schedule than -d 50 for match games to get more opening variety. How about -d 10?

With these results, I would consider the temperature decay implementation sufficiently tested now to be used. Outside of "official" uses for matches, the choice of the decay constant is up to the user. I did not test what the current strength cost of Dirichlet noise is compared to no randomness at all, but picking a really large decay constant will likely closely approximate that already.

@jjoshua2
Copy link
Contributor

I think we should start small and see how big of a problem duplicate games is, and gradually increase until no longer a problem. It should be pretty easy for Gary to try say d25 at the start, and then go to d15 a day later, and if still necessary go to d10 for example, but I think we don't really need much variety with only 400-500 games. SF uses a 4 ply book for even 50,000 games just fine, without extra noise or even multi threaded variability normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants