-
Notifications
You must be signed in to change notification settings - Fork 0
Leela Training
See entry below for Leela endgame networks (Lender). Now training in parallel 256x20b and 320x24b but early days. There is also an Lc0 build that can use two nets and switch at N pieces, thanks to Hans Ekbrand (PR1452). Lender training slowed, working mostly on J94/J95 (value repair), which look promising.
J96 384x30b: value repair continuation of J92 with starting net J92-200. The earlier branch point (J92-200) gives me plenty of T60 games to mix in. After additional testing, I elected to start with LR 0.0001 after all. I will try to post new nets about every 2 days, every 16k steps. (J94 is terminated in favor of J96)
Notes: J96.1-28 (below) is for my book keeping - it is the same as the official release J96-28. J92-200 is the parent net for the J96 branch.
value-head only matches:
# PLAYER : RATING ERROR POINTS PLAYED (%) W L D D(%) CFS(%)
1 nets/J96.1-28 : 27.7 6.8 5393.5 10000 54 4778 3991 1231 12 100
2 nets/J92-200 : 0.0 ---- 4606.5 10000 46 3991 4778 1231 12 ---
policy-head only matches:
# PLAYER : RATING ERROR POINTS PLAYED (%) W L D D(%) CFS(%)
1 nets/J96.1-28 : 2.3 4.0 10066.0 20000 50 6323 6191 7486 37 87
2 nets/J92-200 : 0.0 ---- 9934.0 20000 50 6191 6323 7486 37 ---
J95 (testing best proportion of value repair games) was umm... complex but suggests that 10 to 20% value repair games is best.
J94 384x30b: value repair continuation of J92 with starting net J92-320. Between 10 and 20% of training games are from value repair positions, with the remainder from ongoing T60 games. Current training is at same LR, window size, sampling rate and other settings as J92-320 was, with the only difference being substitution of value repair games for some fraction of the usual T60 training games. Value Repair method
J94 will be on pause for a week or so from 2020-11-19 as I investigate the best value repair strategy at a higher LR (see previous section).
J92 384x30b: starting with net sv_384x30-t60-4300, training on T60 games from 2020-06-26 onward, at LR 0.0006 and 30k steps per 4 million game window. LR 0.0006 from start, changed to 0.0004 at step 85k. Q-ratio 0.0, has MLH head.
Changed game increment/training window up to J92-105 (no effect on sampling rate). Will revert to 4M game window after net 105. Window size change didn't seem to work well, so I am now calling that a branch (at the risk of confusion - only J92B-205 = old J92-205 is affected). Restarted training after J92-190 with usual 4M window size, and posted after 20k steps (J92-210). Will continue as games come in.
Standard matches shown below will be only once every 20k or 30k steps now. Gains are very slow at this point.
- J92-330
- J92-320
- J92-310
- J92-300
- J92-290
- J92-280
- J92-270
- J92-260
- J92-250
- J92-240
- J92-230
- J92-220
- J92-210
- J92B-205
- J92-190
- J92-180
- J92-170
- J92-160
- J92-145
- J92-130
- J92-115
- J92-100
- J92-70
Testing so far (1k nodes) vs Stockfish_20061707 (2.2M nodes), all default except 6-piece TB.
Next test J92-300.
# PLAYER : RATING ERROR PLAYED (%) W L D D(%) CFS(%)
1 lc0.net.J92-270 : 20.4 5.3 8000 53 2132 1678 4190 52 59
2 lc0.net.J92-190 : 19.5 5.3 8000 53 2069 1635 4296 54 63
3 lc0.net.J92-300 : 18.2 5.4 8000 53 2057 1651 4292 54 74
4 lc0.net.J92-210 : 15.8 5.2 8000 52 2007 1655 4338 54 66
5 lc0.net.J92B-205 : 14.2 5.3 8000 52 2026 1710 4264 53 51
6 lc0.net.J92-240 : 14.1 5.1 8000 52 1984 1670 4346 54 55
7 lc0.net.J92-180 : 13.6 5.1 8000 52 1990 1686 4324 54 56
8 lc0.net.J92-220 : 13.1 5.4 8000 52 2031 1739 4230 53 66
9 lc0.net.J92-145 : 11.5 5.4 8000 52 1963 1706 4331 54 72
10 lc0.net.J92-160 : 9.2 5.4 8000 51 1995 1789 4216 53 58
11 lc0.net.J92-130 : 8.5 5.3 8000 51 1987 1798 4215 53 61
12 lc0.net.J92-120 : 7.4 5.1 8000 51 1930 1764 4306 54 59
13 lc0.net.J92-115 : 6.6 5.3 8000 51 1971 1823 4206 53 55
14 lc0.net.J92-70 : 6.2 4.3 12000 51 2924 2716 6360 53 54
15 lc0.net.J92-100 : 5.9 4.3 12000 51 2891 2693 6416 53 78
16 lc0.net.J92-85 : 3.1 5.3 8000 50 1924 1854 4222 53 68
17 lc0.net.J92-55 : 1.3 5.4 8000 50 1912 1882 4206 53 69
18 SF11.5 : 0.0 ---- 192000 49 43243 46467 102290 53 80
19 lc0.net.J92-40 : -2.3 5.2 8000 50 1862 1913 4225 53 85
20 lc0.net.J92-25 : -6.3 5.1 8000 49 1794 1934 4272 53 56
21 lc0.net.J92-20 : -6.9 5.2 8000 49 1790 1943 4267 53 77
22 lc0.net.SV4300 : -9.5 5.2 8000 49 1770 1982 4248 53 62
23 lc0.net.SV4585 : -10.7 5.4 8000 49 1748 1987 4265 53 84
24 lc0.net.SV4619 : -14.6 5.3 8000 48 1710 2035 4255 53 ---
LJ2 320x24b: training from scratch on late T60 training positions with 20 or fewer pieces.
Note: after LJ2-130 Lender net, training uses Value Repair games that use Stockfish12 to ascertain dubious Leela evaluations, so the nets are not zero.
LJ3 256x20b: training from scratch on late T60 training positions with 20 or fewer pieces.
Note: after LJ3-140 Lender net, training uses Value Repair games that use Stockfish12 to ascertain dubious Leela evaluations, so the nets are not zero.
LJ1 192x20b (retired): was being trained on late T60 training positions with 18 or fewer pieces left. "Opening" books appropriate for testing at jhorthos Opening Books Wiki (16 piece book).
Hmm, I had not realized that dkappe already trained an Ender net of (nearly) this size.
Trained on T60 positions after MLH added, with aggressive LR drops, already decent:
Test at very short time control against best T40 net, with a book of 16-pieces left "openings":
1 lc0.net.42850 : 0.0 ---- 2537.0 4000 63 1335 261 2404 60 100
2 lc0.net.LJ1-50 : -78.6 9.0 779.5 2000 39 149 590 1261 63 100
3 lc0.net.LJ1-40 : -114.9 10.1 683.5 2000 34 112 745 1143 57 ---
T60B.7 320x24b: T60 branch (there were 6 previous test runs, hence the 7). Starting with net 63990, training on T60 games from 2020-06-26 onward, at LR 0.0006 and 30k steps per 4 million game window. Nearly identical to J92 except for the net size. LR changed to 0.0004 at step 90k.
Testing so far against Stockfish, 1k nodes vs. 1.4M nodes, all default except 6-piece TB:
SF11.5 = stockfish_20061707
# PLAYER : RATING ERROR PLAYED (%) W L D D(%) CFS(%)
1 lc0.net.T60B.7-105 : 20.8 5.6 8000 53 2336 1875 3789 47 99
2 lc0.net.T60B.7-90 : 11.7 5.5 8000 52 2255 1994 3751 47 50
3 lc0.net.T60B.7-60 : 11.7 5.5 8000 52 2219 1959 3822 48 66
4 lc0.net.T60B.7-45 : 10.0 5.7 8000 51 2236 2013 3751 47 92
5 lc0.net.T60B.7-30 : 4.2 5.5 8000 51 2167 2074 3759 47 93
6 SF11.5 : 0.0 ---- 56000 49 14273 15316 26411 47 96
7 lc0.net.T60B.7-15 : -5.0 5.6 8000 49 2076 2188 3736 47 64
8 lc0.net.63990 : -6.4 5.5 8000 49 2027 2170 3803 48 --- << parent net
J90 and J91 384x30b: with starting net sv_384x30-t60-3010, J90 is training on T60 games from 2020-03-09 onward, all at lower LR than SV is currently using. The purpose is to test whether extended training at very low LR continues to improve a large network. Currently has no moves left head; MLH will be added when the MLH training games are reached. I will post every 20k steps (4096 batch size) initially. Current LR is 0.0002, which might eventually be reduced to 0.0001 or lower. Training window is 10 million games, possibly growing to 20 million later. J90 training parameters. Fraction correct (fc) is from a 1k node position test on 209k problems, with +/-0.002 expected sampling error (2 sd).
Training will continue on J91 (J90 is stopped). There should be a new 20k step network approximately every 2 days. Changes are slow but I am fairly sure real, will post once every 50k steps after J91-100.
J90 (q_ratio=0.2):
J91 (q_ratio=0.0):
192x16b: extension of J20 training (which was trained on T40 games), trained on T60 games.
800 node match testing (J20-400 parent net):
# PLAYER : RATING ERROR POINTS PLAYED (%) W L D D(%) CFS(%)
1 lc0.net.J64-180 : 64.5 7.4 2357.0 4000 59 1215 501 2284 57 100
2 lc0.net.J64-140 : 49.9 7.0 2277.5 4000 57 1147 592 2261 57 58
3 lc0.net.J64-130 : 48.8 7.2 2271.5 4000 57 1134 591 2275 57 78
4 lc0.net.J64-150 : 44.9 6.8 2250.0 4000 56 1111 611 2278 57 70
5 lc0.net.J64-120 : 42.2 7.2 2235.0 4000 56 1125 655 2220 56 100
6 lc0.net.J20-400 : 0.0 ---- 8609.0 20000 43 2950 5732 11318 57 ---
256x20b: J48 is a follow-along training using T60 games as they progress. It is currently on LR 0.015 and is nearly up to date with T60 games. More nets will appear as T60 progresses.
320x24b: J13B.3 is from cyclical LR training on a megawindow of lowest LR T40 games. The cycle is 20k training steps long and has a high LR of 0.0005 and a low of 0.00002, with exponential decay from high to low point. The peak LR decays slightly each cycle. Starting at step 40k nets will be posted at each LR low point (one every 20k steps). COMPLETED
320x24b: J13B.4 is from cyclical LR training on a megawindow of lowest LR T40 games. It is similar to J13B.3 except that the peak LR is much higher (0.002) and the cycle is longer (30k steps).
256x20b: T40B.1 is a continuation of T40 training with experimental training conditions. Starting net is 42770. All T40 training games from LR 0.0007 on were combined into one huge training window (about 12M games) and trained with starting LR 0.002 gradually decreasing to 0.00005. The hypothesis is that normal training on progressive windows learns positions from later games better than earlier ones, and that the megawindow approach will learn evenly (though it discards the benefit of reinforcement enjoyed in the parent T40 run). I also converted to a WDL head and did my usual partial 7-man table base rescoring, so these are possible confounders for interpretation. Nets labeled T40B.1 to indicate branch 1 of T40 training. Pure zero (all training on T40 games). Low node Elo estimates now complete, shown below net links. I removed the older nets - ping me on Discord if you want them (you don't, they aren't as good).
T40B.1 TRAINING FINISHED net 106 is the last net
800 fixed node matches for T40B.1 nets played against parent net 42770 (~best T40):
# PLAYER : RATING ERROR POINTS PLAYED (%) W L D
1 lc0.net.T40B-86 : 15.3 8.0 1564.5 3000 52 638 509 1853
2 lc0.net.T40B-94 : 14.7 8.0 1562.0 3000 52 662 538 1800
3 lc0.net.T40B-82 : 14.6 7.9 1561.5 3000 52 652 529 1819
4 lc0.net.T40B-106 : 14.1 7.7 1559.5 3000 52 661 542 1797
5 lc0.net.T40B-40 : 11.4 7.8 1548.0 3000 52 636 540 1824
6 lc0.net.T40B-78 : 10.9 7.9 1546.0 3000 52 651 559 1790
7 lc0.net.T40B-98 : 9.7 8.1 1541.0 3000 51 644 562 1794
8 lc0.net.T40B-102 : 7.5 7.8 1531.5 3000 51 651 588 1761
9 lc0.net.T40B-90 : 7.1 7.7 1530.0 3000 51 621 561 1818
.... (trimmed)
17 lc0.net.42770 : 0.0 ---- 24824.0 50800 49 9473 10625 30702
T40B.2 also used the megawindow approach of T40B.1 but with a cosine annealing cyclical LR. The method and logic is close to that described in https://towardsdatascience.com/https-medium-com-reina-wang-tw-stochastic-gradient-descent-with-restarts-5f511975163, except that the peak LR at each cycle is gradually decreased. The cycle is 12k steps in length for 8 cycles followed by 10k steps at the lowest LR. Only the final net is posted as it appears to be the best based on 800 node tests. No obvious improvement on B.1.
800 fixed node matches for best T40B.2 net against parent net 42770 (~best T40):
# PLAYER : RATING ERROR POINTS PLAYED (%) W L D CFS(%)
1 lc0.net.T40B.2-106 : 15.2 7.6 1564.0 3000 52 642 514 1844 100
2 lc0.net.42770 : 0.0 ---- 1436.0 3000 48 514 642 1844 ---
T40B.3 was similar to B.1 except a bigger training batch size. This did not appear to improve on B.1 and no nets are posted.
T40B.4 is is similar to B.1 except for a smaller training batch size. Now complete - it is possible there was some overfitting after net 200? [nota bene there was a clerical error in Elo assignments, now fixed]
- T40B.4-260 : +9 (+/-8) Elo to 42770 in 800 node test
- T40B.4-240 : +13 (+/-8) Elo to 42770 in 800 node test
- T40B.4-220 : +11 (+/-8) Elo ditto
- T40B.4-200 : +19 (+/-8) Elo ditto
- T40B.4-180 : +11 (+/-8) Elo ditto
- T40B.4-160 : +21 (+/-8) Elo ditto
192x16b: 192 filters, 16 blocks, SE (ratio 6), WDL value head, conv policy head, trained on T40 games
320x24b: 320 filters, 24 blocks, SE (ratio 10), WDL head head, conv policy head, trained on T40 games, training complete. Also known as the 'Terminator' series. Removed rarely used nets - available by request. 410 is likely the best.
last J13 net at LR 0.2: J13-50
J13B.2 attempt to improve on J13 using the same method as for T40B.4 (there is a J13B.1 but it wasn't very successful, no nets posted). Test results listed after the nets.
3000 fixed node tests against parent net:
# PLAYER : RATING ERROR POINTS PLAYED (%) W L D
1 lc0.net.J13B.2-178 : 11.4 6.2 1548.0 3000 52 425 329 2246
2 lc0.net.J13B.2-136 : 11.1 6.3 1547.0 3000 52 405 311 2284
3 lc0.net.J13B.2-148 : 10.1 6.3 1542.5 3000 51 426 341 2233
4 lc0.net.J13B.2-200 : 10.1 6.2 1542.5 3000 51 427 342 2231
5 lc0.net.J13B.2-188 : 9.2 6.3 1539.0 3000 51 410 332 2258
6 lc0.net.J13B.2-168 : 8.2 6.4 1534.5 3000 51 408 339 2253
7 lc0.net.J13B.2-220 : 7.6 6.4 1472.0 2882 51 401 339 2142
8 lc0.net.J13B.2-120 : 7.3 6.2 1531.0 3000 51 421 359 2220
9 lc0.net.J13B.2-158 : 3.6 6.3 1515.0 3000 51 390 360 2250
10 lc0.net.J13-410 : 0.0 ---- 13110.5 26882 49 3052 3713 20117
256x16b: 256 filters, 16 blocks, SE (ratio 8), WDL value head, conv policy head, trained on T40 games (training on hold to favor other experiments)
- J17-190 mid-training