Skip to content

Endgame Net

dkappe edited this page Sep 16, 2018 · 33 revisions

Latest release

Latest News

I've modified lc0 to take a file of epd's as starting positions. I'm now feeding it the same data as the adversarial play against sf9tb and alternating training on 20k batches of games. A 64x6 net really breezes through 20k endgame positions.

The command line for the self plays is

./lc0.ender selfplay --training --games=20000 -w ender-latest.txt.gz --visits=800 --cpuct=5.0 --resign-percentage=0

Latest Training

It’s changed a bit. I am using temp=1 in self play from 20k randomized 12 man epd’s I use noise for adversarial play with sf9tb. The nodes there are 350k for sf and 1600 for lc0.

I feed 2 adversarial batches for every 1 self play batch.

I’m going to play a 100 12 man epd match, with colors reversed, against 11258.

T11258 - 5 sec, 2 thr, no prune
Success rate: 53.69% (80/149)
Ender 38 - 5 sec, 2 thr, no prune
Success rate: 62.42% (93/149)

Based on my most recent test suite run, I am hopeful.

History

The Ender net (64x6) was initially trained on ~400k semirandom 6, 5, 4, 3 man positions with perfect playouts by sf9tb. This lead to mediocre play.

Currently the net is being trained on 20k batches of playouts (500k window), played from 12 and 6 man positions sampled from a CCRL database, Kingbase played out from resignation, as well as 12, 6, 5, 4, and 3 man semirandom positions. The positions are played both ways between sf9tb and the latest net at 0.25s vs 3200 nodes per move.

The training makes use of @borg's zero history patch, with the added wrinkle that it is only applied 10% of the time. The net does well with and without history, as a result. (See This GoNN page for thoughts on this approach.)

Test Suite

Test suites aren't the end all and be all of testing, but Ender 5 has finally surpassed the 20b networks:

Ender 5 - 5 seconds, 2 threads
Success rate: 52.35% (78/149)
T902 - 5 seconds, 2 threads
Success rate: 45.64% (68/149)

First Goal: win KQvKR

Right now none of the Leela nets can do this.

ID512 over 1000 positions (2000 playouts)

12 man CCRL        Elo difference: 61.08 +/- 24.17
12 man Kingbase    Elo difference: 76.98 +/- 23.64
12 man semi random Elo difference: 10.43 +/- 47.67
6 man CCRL         Elo difference: 36.62 +/- 35.66
6 man Kingbase     Elo difference: 47.19 +/- 37.99
6 man semi random  Elo difference: 10.43 +/- 61.08
5 man semi random  Elo difference: 10.43 +/- 61.08
4 man semi random  Elo difference: 0.00 +/- 54.85
3 man semi random  Elo difference: 0.00 +/- 60.66

Ender 3 (trained on batch 3) over 10k positions (20k playouts)

12 man CCRL        Elo difference: 99.05 +/- 7.85
12 man Kingbase    Elo difference: 99.65 +/- 7.75
12 man semi random Elo difference: 15.99 +/- 14.86
6 man CCRL         Elo difference: 38.02 +/- 10.72
6 man Kingbase     Elo difference: 50.56 +/- 11.37
6 man semi random  Elo difference: 16.69 +/- 19.51
5 man semi random  Elo difference: 13.56 +/- 18.61
4 man semi random  Elo difference: 15.99 +/- 16.95
3 man semi random  Elo difference: 0.00 +/- 19.49