Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add history to the NNCache key #546

Merged
merged 9 commits into from
May 7, 2018
Merged

Conversation

Tilps
Copy link
Contributor

@Tilps Tilps commented May 6, 2018

I'm still testing this, but a previous version with history planes 2-7 using key and simple xor instead of full_key and *31 xor was +50 +/- 46 Elo in self play at 40/1 time control.
This version trades even more NNCache hits for NN evaluation accuracy, so it may do better or worse, hard to tell at this point. But I thought my logic for the previous version wasn't quite 'zero' enough. This version is 'just a bug fix', where the other one is being a bit 'smart' about assuming what the NN gets out of the history.

Note: I don't intend this to be submitted as a command line option - once I've demonstrated that the Elo gain is positive under time control I'll remove the command line option from the PR. I'm just using it for my testing purposes.

@Tilps
Copy link
Contributor Author

Tilps commented May 6, 2018

So my test is ongoing - but its still looking likely to be ~50 Elo win for this new version of hash key (I'm at 230 games in a gauntlet of enabled and disabled vs pawny 1.2). So the nps regression is pretty clearly overcome by what must be an even larger Elo win in visit quality.

To expand on my comment above that this is a 'bug fix'. Upon further reflection there is actually a fair amount of symmetry between this and the promotion bug (both effectively lie to training, one by evaluating all promotions as knights, the other by reusing results without checking history match) - although more subtle and probably not quite as significant, this bug has been in the training for much longer. Unlike the pawn promotion bug, this one has a performance cost to fix, but like the pawn promotion bug I think we should expect to see significant improvements in net quality once we start training with the bug removed.

An argument could be made to remove the NNCache entirely, but I'll provide 3 reasons why this is not ideal.

  1. Even with the stricter key, I still see some cache hits during evaluation. (And these are not hash collisions, I've checked and the NN output was the same whether cache was enabled or disabled in this situation.)
  2. The NNCache is essential if we want to support disabling tree reuse in some scenarios. (Specifically I want to do this for training so noise is applied evenly.)
  3. People will want to use leela in an evaluation setting in future, in these scenarios you might switch back and forth between branches of the move tree - which tree reuse will not handle. (This use case will probably need a much larger NNCache to be especially useful, but we can easily make that a uci parameter.)

@dubslow
Copy link
Contributor

dubslow commented May 6, 2018

I'm all for this PR, although I must admit I'm even more convinced that we should test a Leela without history at all

@Tilps
Copy link
Contributor Author

Tilps commented May 7, 2018

Completed gauntlet - new key is +61 Elo, +/- 35 compared to old key under 40/1 time control vs Pawny.

@Tilps
Copy link
Contributor Author

Tilps commented May 7, 2018

Cache hit rate drops as expected. (Under training conditions. That is, 'current' training conditions, still using tree reuse, without tree reuse these rates would be considerably higher on both sides.)
Old key: ~15% hit rate.
New key: ~1% hit rate.

@Tilps
Copy link
Contributor Author

Tilps commented May 7, 2018

I ran an additional test where each successful lookup also performed NN eval and compared the results with a 0.1% tolerance.
Zero false positive key matches even with 10x sized cache left running for hours. (So reasonable confidence the hashing is decent with the *31 mixing. And pretty string confidence that if there are any false positives they won't affect training in any meaningful way.)

@killerducky killerducky merged commit 1621b0f into glinscott:next May 7, 2018
@jjoshua2
Copy link
Contributor

jjoshua2 commented May 9, 2018

2+2 with syzygy against SM 120, so not fair

Score of lczeroTB v8 id251 vs lczeroCache GPU: 6 - 0 - 6 [0.750]
Elo difference: 190.85 +/- 144.31

restesting with both getting sm 120

Score of lczeroTB v8 id251 vs lczeroCache GPU: 0 - 2 - 3 [0.300]
Elo difference: -147.19 +/- 210.73

so far its going better...

@Tilps
Copy link
Contributor Author

Tilps commented May 9, 2018

I think 2+2 (and higher) on a good gpu could quite plausibly be a regression. The false positive cache hits in the old key would happen more frequently in a larger tree and performance is a powerful Elo factor.
I still feel confident that this PR is an important step forward, as my testing was under visit counts closer to training levels. Once training data doesn't have the bug in it, the power of the full key is likely to grow.
We may however want future command line options outside of training, which balance between key accuracy and cache hit rates to optimize performance in tcec or similar.

@jjoshua2
Copy link
Contributor

jjoshua2 commented May 9, 2018

its even now, so it's not too bad hopefully at least...
2+2 both sm 120 w/ syzygy

Score of lczeroTB v8 id251 vs lczeroCache GPU: 2 - 2 - 9 [0.500]
Elo difference: 0.00 +/- 107.96

i'm running 5 concurrent on 1080 ti with 1 thread each, so each one is maybe 500-1000 nps only
edit: now its

Score of lczeroTB v8 id251 vs lczeroCache GPU: 4 - 3 - 25 [0.516]
Elo difference: 10.86 +/- 56.65

@Tilps
Copy link
Contributor Author

Tilps commented May 9, 2018

Yeah my tests were ~500nps on 1+1 ish.

@jjoshua2
Copy link
Contributor

jjoshua2 commented May 9, 2018

continuation of same... stopping it now

Score of lczeroTB v8 id251 vs lczeroCache GPU: 11 - 5 - 42 [0.552]
Elo difference: 36.07 +/- 46.76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants