Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recursive search depth, remove FPU VL bug #466

Merged
merged 7 commits into from
May 2, 2018

Conversation

jkiliani
Copy link
Contributor

Adds recursive tracking for maximum search depth and replaces the current search depth output based on a log function of visits. Fixes the virtual loss bug on first play urgency. Activates USE_TUNER by default, and adds FPU reduction to parameters tuneable from command line.

Adds recursive tracking for maximum search depth and replaces the current search depth output based on a log function of visits. Fixes the virtual loss bug on first play urgency. Activates USE_TUNER by default, and adds FPU reduction to parameters tuneable from command line.
@jkiliani
Copy link
Contributor Author

jkiliani commented Apr 29, 2018

As far as I can determine, bench impact of recursive depth tracking is negligible. Fixing the virtual loss bug in FPU seems doable without negative effects as far as I can determine from tests with the previous PR #438. Last tuning tournament with the code from #438, with Id 211:

Rank Name                           Elo     +/-   Games   Score   Draws  
   1 lc_next_puct06                  51      56      83   57.2%   44.6%
   2 lc_fixedVL_puct06               37      57      84   55.4%   41.7%
   3 lc_fixedVL_puct085             -29      55      83   45.8%   45.8%
   4 lc_next_puct085                -60      57      82   41.5%   43.9%

At least for self-play, the reduced PUCT appears significantly stronger for multiple nets. The difference between before fixing the VL bug and after appears statistically insignificant. I'll now do some tests with the code from this PR, again comparing /next and the fixed VL version, at different PUCT values.
I'll also add 0.1 FPU reduction to the tested parameters.

Some more relevant results with older networks (all 800 visits):
Id 194:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 lc_next_puct06                 43      65      57   56.1%   49.1%
   2 lc_fixedVL_puct06              31      58      57   54.4%   59.6%
   3 lc_fixedVL_puct085             12      71      57   51.8%   40.4%
   4 lc_next_puct085                 6      68      58   50.9%   43.1%

Id 202:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 lc_fixedVL_puct06              73      58      63   60.3%   54.0%
   2 lc_next_puct06                 33      62      63   54.8%   49.2%
   3 lc_next_puct085                28      65      63   54.0%   44.4%
   7 lc_fixedVL_puct085            -44      68      63   43.7%   39.7%

The latter two tournaments were round-robins that also included some now discarded options for FPU reduction so the Elo don't sum up to 0. However, common among all test results is that puct = 0.6 is somewhat stronger than 0.85, and the strength difference between next branch and fixing the VL bug is statistically insignificant.

Current cutechess-cli script
./cutechess-cli -rounds 50 -tournament round-robin -concurrency 2 -pgnout results_prog.pgn \
 -engine name=lc_puct6 cmd=lczero_dyneval arg="--threads=1" arg="--weights=$WDR/weights_217.txt" arg="--puct=0.6" arg="--tempdecay=10" nodes=800 tc=inf \
 -engine name=lc cmd=lczero_dyneval arg="--threads=1" arg="--weights=$WDR/weights_217.txt" arg="--tempdecay=10" nodes=800 tc=inf \
 -engine name=lc_fpu01_puct6 cmd=lczero_dyneval arg="--threads=1" arg="--weights=$WDR/weights_217.txt" arg="--fpu_reduction=0.1" arg="--puct=0.6" arg="--tempdecay=10" nodes=800 tc=inf \
 -engine name=lc_fpu01 cmd=lczero_dyneval arg="--threads=1" arg="--weights=$WDR/weights_217.txt" arg="--fpu_reduction=0.1" arg="--tempdecay=10" nodes=800 tc=inf \
 -engine name=lc_next_puct6 cmd=lczero_backup arg="--threads=1" arg="--weights=$WDR/weights_217.txt" arg="--puct=0.6" arg="--tempdecay=10" nodes=800 tc=inf \
 -engine name=lc_next cmd=lczero_backup arg="--threads=1" arg="--weights=$WDR/weights_217.txt" arg="--tempdecay=10" nodes=800 tc=inf \
 -each proto=uci

(Insert your working directory for the weights, and export the paths for the lczero executables first)

const auto& cur = bh.cur();
const auto color = cur.side_to_move();

auto result = SearchResult{};

node->virtual_loss();
if (ndepth > m_maxdepth) {
m_maxdepth = ndepth;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you actually want to make this truly threadsafe, it would need to be something like....
int cur_depth;
do { cur_depth = m_maxdepth; } while(ndepth > cur_depth && !m_maxdepth.compare_exchange_strong(cur_depth, ndepth);

But maybe we don't care that much?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it probably isn't really thread safe at the moment. However, @killerducky indicates it won't be pulled as is anyway, he prefers the solution defining search depth on PV length since that is more compatible to the Tensorflow implementation.

I'm just keeping this open now until I post the last tuning tests with it, and then close it. Making additional tuning options available without UCI can be its own PR, and fixing the VL bug will probably have to wait a bit...

@jkiliani
Copy link
Contributor Author

Tuning test result with Id 217:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 lc_fpu01_puct06                70      52      95   60.0%   46.3%
   2 lc_next_puct06                 62      52      97   58.8%   45.4%
   3 lc_fpu00_puct06                18      54      97   52.6%   39.2%
   4 lc_next_puct085               -36      55      96   44.8%   39.6%
   5 lc_fpu00_puct085              -55      59      96   42.2%   30.2%
   6 lc_fpu01_puct085              -59      54      95   41.6%   41.1%

FPU reduction of 0.1 at least for this net gives a sufficient benefit to compensate fixing the VL bug. I started a tuning run with Id 226 now (i.e. the final 10 block net)

@killerducky
Copy link
Collaborator

Results:
lczero-id228-av539-puct0p6 vs lczero-id228-fpu-v1-fix-fpu-0p1:
70-40-193
Elo diff: 34.51 +/- 23.48

av539 means that appveyor build number.
The other one is this PR, with --fpu_reduction=0.1

It's 34.5 Elo worse on id228, but @mooskagh points out a bigger issue here: #317 (comment) FPU reduction is being done by mistake on the root node even when noise is on (implies training). This will make it more difficult for the Network to learn about low-policy moves.

@jkiliani I think we should change the default fpu_reduction to 0.1 (best we have for now), and merge this.

@jkiliani
Copy link
Contributor Author

jkiliani commented May 1, 2018

One more test result, for Id 226:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 lc_fpu00_puct6                 77      58      78   60.9%   44.9%
   2 lc_fpu01_puct6                 72      57      78   60.3%   46.2%
   3 lc_next_puct6                  27      54      78   53.8%   51.3%
   4 lc_fpu00_puct085              -22      55      78   46.8%   50.0%
   5 lc_fpu01_puct085              -36      57      78   44.9%   46.2%
   6 lc_next_puct085              -120      59      78   33.3%   43.6%

Here the bug also doesn't help playing strength, for puct=0.6. FPU reduction of 0.1 doesn't hurt compared to none, but does not help a lot either.

@killerducky
Copy link
Collaborator

killerducky commented May 1, 2018

I spoke with @jkiliani about this, we will do one more final test of FPU 0 vs 0.1 head to head on another net and then decide tomorrow.

Edit: Going to test FPU 0.05 also.

@jkiliani
Copy link
Contributor Author

jkiliani commented May 2, 2018

Accidentally had my test script stop after just 50 games, on Id 231:

Score of lc_fpu00 vs lc_fpu01: 6 - 16 - 28  [0.400] 50
Elo difference: -70.44 +/- 64.16

I'm going to continue the match with this net until @killerducky wants to decide, but for the moment this is another (although weak) indication supporting cfg_fpu_reduction = 0.1.

Current standing of the continued test is

Score of lc_fpu00 vs lc_fpu01: 7 - 6 - 15  [0.518] 28

which means that FPU reduction = 0.1 is still leading a lot in the aggregate score. But likely, any of 0.0, 0.05 and 0.1 would work fine.

@killerducky
Copy link
Collaborator

Rank Name                                  Elo     +/-   Games   Score   Draws
   1 lczero-id232-fpu-vl-fix-fpu-0p1       22      30     174   53.2%   67.2%
   2 lczero-id232-fpu-vl-fix-fpu-0p05       0      30     174   50.0%   66.7%
   3 lczero-id232-fpu-vl-fix-fpu-0        -22      30     172   46.8%   65.7%
261 of 6000 games finished.

Let's go with 0.1.

@killerducky killerducky merged commit 30bb68c into glinscott:next May 2, 2018
@killerducky
Copy link
Collaborator

@jkiliani ok I am preparing a release and I noticed the depth tracking is wrong:

info depth 4 nodes 800 nps 435 tbhits 0 score cp -419 time 22 pv f7d7 d3g6 h7g8 g6h5 d7d2 e6e2 d2e2 h5e2 g7e5 f1f2 g8g7 f2f3 e5f6 f3e4 g7g6
move played f7f4

The PV is longer than the depth. Probably something to do with tree reuse? I'm going to revert the depth change so I can get v0.8 out quickly.

killerducky added a commit to killerducky/leela-chess that referenced this pull request May 3, 2018
@killerducky killerducky mentioned this pull request May 3, 2018
killerducky added a commit that referenced this pull request May 3, 2018
* Revert depth calculation.

See PR #466.
@jkiliani jkiliani deleted the jkiliani-patch-2 branch May 3, 2018 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants