-
Notifications
You must be signed in to change notification settings - Fork 8
Endgame Net
I’ve started training Ender 160x10-se. I will need to release Ender128-90l, the strongest of the 128 nets.
I ran a comparison match with ID11258 and Ender against Komodo 12. Leela Ratio was 0.75, and time per move was 2.5 seconds. In these matches, ID11258 plays a game to completion, then Ender takes over at 16p and replays the endgame. So two games are recorded.
The results (divided by white and black) were:
White
# PLAYER : RATING POINTS PLAYED (%)
1 Franken-Ender-83 : 3517.9 23.5 44 53.4%
2 Komodo 12 : 3494.0 43.0 88 48.9%
3 ID11258 lc0 0.19.1-rc1 : 3486.0 21.5 44 48.9%
Black
# PLAYER : RATING POINTS PLAYED (%)
1 Komodo 12 : 3494.0 45.0 78 57.7%
2 ID11258 lc0 0.19.1-rc2 : 3439.6 16.5 39 42.3%
3 Franken-Ender-88 : 3439.6 16.5 39 42.3%
You'll note I switched from Ender 83 for white and RC1, to Ender 88 and RC2 for black.
The delta games, where the results were different, are here for white and black.
I’m going to go into a little more detail on the “dodgy” position training. TBD
I glued up ID11258 and Ender83 to play as a UCI engine, Ender taking over when the piece count drops to 16 or less.
- All engines had access to 6 man tb.
- Openings were random 3 move Noomen, played twice with colors reversed.
- TC 1s per move.
- Leela Ratio 3.12
This Frankenstein did well against Komodo 12, but not so well vs SF10.
Komodo 12 results:
Score of Dual vs Komodo 12 TB 1: 6 - 0 - 14 [0.650]
Elo difference: 107.54 +/- 78.22
Stockfish 10 results:
Score of Dual vs Stockfish 10 TB: 3 - 7 - 10 [0.400]
Elo difference: -70.44 +/- 111.56
I’ve committed a “some assembly required” UCI wrapper to bolt Ender onto another net here
Some major developments. Thanks to @oscardssmith for suggesting the idea and providing some initial code for distilling problem positions. That's where the position's value evaluation by Ender is at least .5 winrate away from a predicted value. Initially I used @oscardssmith's code for generating random 6 man positions and using WDL tablebase values to get the predicted value.
I've been mixing in 5k dodgy positions per 20k for a few rounds and the effect has been dramatic. Ender128-80 reached the best performance both in the endgame test suite and play performance (vs sf9tb with 250k nodes).
Test Suite:
Ender 128-80 - 5 sec, 2 thr, 1.0 prune
Success rate: 63.09% (94/149)
Play:
Score of stockfishTB vs Ender128-80: 140 - 156 - 104 [0.480] 400
Elo difference: -13.90 +/- 29.33
16p SFNODES=250000 LCNODES=62000
Batch 81 has dodgy positions mixed in with higher number of pieces, where the prediction is from SF9 with 350k nodes. BTW, no data is taken from SF9, it is just used to filter the positions that Ender is trained on.
Stay tuned.
- I’ve started training a 128x10 network and upped target to 16p. I continue to train the 64x6 Ender network on the self-play games produced by the 128x10.
- I’ve started to add a random half-move clock value of between 0-99 to 10% of the epd start positions. Typical evaluations for drawn positions have converged rapidly to 0. So, for example, some of the drawn endgames in The Carlsen-Caruana WC match went from 2.5-3.0 (also in 11258) to 0.1-0.4.
Ender 62 wins QvR. First nn to do so! What was the difference? I was converting the epd’s to fen’s by tacking on a “ 0 1”. Now I’m tacking on “ X 80”, where X is the half move clock in the range 0-99. Also, self-play is now using 6 man TB.
Also, Ender is starting to get the best of sf9tb from 12 man positions. Ender 62 got 1.5-0.5 from a 12 man played twice with colors reversed. Still early days, though.
200 positions, TC 0.25 sec per move on 200 12 man positions.
Score of ID9149 vs Ender62: 94 - 144 - 162 [0.438] 400
Elo difference: -43.66 +/- 26.34
I’ve moved over mostly to self-play with 14 man epd’s. The high water mark was Ender 52 so far. In a 0.25s per move match from 200 12 man epd’s, we get:
Score of stockfishTB vs Ender52: 146 - 101 - 153 [0.556] 400
Elo difference: 39.25 +/- 26.84
Score of stockfishTB vs 11258: 157 - 88 - 155 [0.586] 400
Elo difference: 60.54 +/- 26.81
Hopefully we can improve some more.
I've modified lc0 to take a file of epd's as starting positions. I'm now feeding it the same data as the adversarial play against sf9tb and alternating training on 20k batches of games. A 64x6 net really breezes through 20k endgame positions.
The command line for the self plays is
./lc0.ender selfplay --training --games=20000 -w ender-latest.txt.gz --visits=800 --cpuct=5.0 --resign-percentage=0
It’s changed a bit. I am using temp=1 in self play from 20k randomized 12 man epd’s I use noise for adversarial play with sf9tb. The nodes there are 350k for sf and 1600 for lc0.
I feed 2 adversarial batches for every 1 self play batch.
I’m going to play a 100 12 man epd match, with colors reversed, against 11258.
T11258 - 5 sec, 2 thr, no prune
Success rate: 53.69% (80/149)
Ender 38 - 5 sec, 2 thr, no prune
Success rate: 62.42% (93/149)
Based on my most recent test suite run, I am hopeful.
The Ender net (64x6) was initially trained on ~400k semirandom 6, 5, 4, 3 man positions with perfect playouts by sf9tb. This lead to mediocre play.
Currently the net is being trained on 20k batches of playouts (500k window), played from 12 and 6 man positions sampled from a CCRL database, Kingbase played out from resignation, as well as 12, 6, 5, 4, and 3 man semirandom positions. The positions are played both ways between sf9tb and the latest net at 0.25s vs 3200 nodes per move.
The training makes use of @borg's zero history patch, with the added wrinkle that it is only applied 10% of the time. The net does well with and without history, as a result. (See This GoNN page for thoughts on this approach.)
Test suites aren't the end all and be all of testing, but Ender 5 has finally surpassed the 20b networks:
Ender 5 - 5 seconds, 2 threads
Success rate: 52.35% (78/149)
T902 - 5 seconds, 2 threads
Success rate: 45.64% (68/149)
Right now none of the Leela nets can do this.
12 man CCRL Elo difference: 61.08 +/- 24.17
12 man Kingbase Elo difference: 76.98 +/- 23.64
12 man semi random Elo difference: 10.43 +/- 47.67
6 man CCRL Elo difference: 36.62 +/- 35.66
6 man Kingbase Elo difference: 47.19 +/- 37.99
6 man semi random Elo difference: 10.43 +/- 61.08
5 man semi random Elo difference: 10.43 +/- 61.08
4 man semi random Elo difference: 0.00 +/- 54.85
3 man semi random Elo difference: 0.00 +/- 60.66
12 man CCRL Elo difference: 99.05 +/- 7.85
12 man Kingbase Elo difference: 99.65 +/- 7.75
12 man semi random Elo difference: 15.99 +/- 14.86
6 man CCRL Elo difference: 38.02 +/- 10.72
6 man Kingbase Elo difference: 50.56 +/- 11.37
6 man semi random Elo difference: 16.69 +/- 19.51
5 man semi random Elo difference: 13.56 +/- 18.61
4 man semi random Elo difference: 15.99 +/- 16.95
3 man semi random Elo difference: 0.00 +/- 19.49
My new (old) blog is at lczero.libertymedia.io