Skip to content

Leela Training

James Horsfall Thomas edited this page Jan 9, 2021 · 442 revisions

Network Links


See entry below for Leela endgame networks (Lender). There is also an Lc0 build that can use two nets and switch at N pieces, thanks to Hans Ekbrand (PR1452). Lender training slowed, working mostly on J94/J104 (value repair for large and small nets), which look promising.


J94 384x30b: value repair continuation of J92 with starting net J92-320. Between 10 and 20% of training games are from value repair positions, with the remainder from ongoing T60 games. Current training is at same LR, window size, sampling rate and other settings as J92-320 was, with the only difference being substitution of value repair games for some fraction of the usual T60 training games. [Value Repair method] (https://github.com/jhorthos/lczero-training/wiki/Value-Repair-method)
note - 120 and 130 are from the same value repair game set. preliminary Elo tests suggest modest improvement at best over J94-100.


J104 128x10b: value repair with starting net 703810. Parameters for collecting value repair games are now optimized and this run uses a 2:1 ratio of near-end T70 training games to games initiated from value repair positions.


J98 384x30b: value repair continuation with starting net J92-320. Details to come.


J96 384x30b: value repair continuation of J92 with starting net J92-200. The earlier branch point (J92-200) gives me plenty of T60 games to mix in. After additional testing, I elected to start with LR 0.0001 after all. I will try to post new nets about every 2 days, every 16k steps. (J94 is terminated in favor of J96) Value Repair method

An interesting apparent limit in value head gains was hit - see table below. Based on preliminary tests, I think this is because the ratio of T60 to value repair games should be higher (since each Value Repair game effectively contributes only 1 repair position, this is not so surprising). Testing further now, and seeing if I can generate the Value Repair positions faster, since that will be limiting.

Notes: J96.1 is for my book keeping - they are same as the official release J96 nets. J92-200 is the parent net for the J96 branch.

value-head only matches:

   # PLAYER           :  RATING  ERROR   POINTS  PLAYED   (%)      W      L      D  D(%)  CFS(%)
   1 nets/J96.1-16    :    33.6    6.8   5477.5   10000    55   4874   3919   1207    12      59
   2 nets/J96.1-6     :    32.6    6.7   5462.5   10000    55   4865   3940   1195    12      51
   3 nets/J96.1-36    :    32.5    6.6   5461.0   10000    55   4930   4008   1062    11      58
   4 nets/J96.1-20    :    31.5    6.6   5448.0   10000    54   4847   3951   1202    12      72
   5 nets/J96.1-52    :    28.7    6.7   5407.5   10000    54   4853   4038   1109    11      52
   6 nets/J96.1-44    :    28.4    6.6   5403.5   10000    54   4861   4054   1085    11      56
   7 nets/J96.1-28    :    27.7    6.4   5393.5   10000    54   4778   3991   1231    12      56
   8 nets/J96.1-30    :    27.0    6.5   5383.5   10000    54   4822   4055   1123    11      56
   9 nets/J96.1-12    :    26.3    6.6   5374.0   10000    54   4711   3963   1326    13      70
  10 nets/J96.1-10    :    23.8    6.7   5339.0   10000    53   4703   4025   1272    13      59
  11 nets/J96.1-11    :    22.7    6.8   5323.5   10000    53   4666   4019   1315    13      70
  12 nets/J96.1-9     :    20.1    6.6   5285.5   10000    53   4668   4097   1235    12      53
  13 nets/J96.1-2     :    19.8    6.5   5281.0   10000    53   4675   4113   1212    12      61
  14 nets/J96.1-4     :    18.4    6.6   5262.5   10000    53   4606   4081   1313    13      54
  15 nets/J96.1-1     :    18.0    7.0   5256.0   10000    53   4647   4135   1218    12      54
  16 nets/J96.1-3     :    17.5    6.5   5249.0   10000    52   4648   4150   1202    12     100
  17 nets/J92-200     :     0.0   ----  74192.5  160000    46  64539  76154  19307    12     ---

policy-head only matches:

   # PLAYER           :  RATING  ERROR   POINTS  PLAYED   (%)     W     L     D  D(%)  CFS(%)
   1 nets/J96.1-28    :     2.3    4.0  10066.0   20000    50  6323  6191  7486    37      87
   2 nets/J92-200     :     0.0   ----   9934.0   20000    50  6191  6323  7486    37     ---

J95 (testing best proportion of value repair games) was umm... complex but suggests that 10 to 20% value repair games is best.


J92 384x30b: starting with net sv_384x30-t60-4300, training on T60 games from 2020-06-26 onward, at LR 0.0006 and 30k steps per 4 million game window. LR 0.0006 from start, changed to 0.0004 at step 85k. Q-ratio 0.0, has MLH head.

Changed game increment/training window up to J92-105 (no effect on sampling rate). Will revert to 4M game window after net 105. Window size change didn't seem to work well, so I am now calling that a branch (at the risk of confusion - only J92B-205 = old J92-205 is affected). Restarted training after J92-190 with usual 4M window size, and posted after 20k steps (J92-210). Will continue as games come in.

Standard matches shown below will be only once every 20k or 30k steps now. Gains are very slow at this point.

Testing so far (1k nodes) vs Stockfish_20061707 (2.2M nodes), all default except 6-piece TB.
Next test J92-300.

   # PLAYER              :  RATING  ERROR  PLAYED   (%)      W      L       D  D(%)  CFS(%)
   1 lc0.net.J92-270     :    20.4    5.3    8000    53   2132   1678    4190    52      59
   2 lc0.net.J92-190     :    19.5    5.3    8000    53   2069   1635    4296    54      63
   3 lc0.net.J92-300     :    18.2    5.4    8000    53   2057   1651    4292    54      74
   4 lc0.net.J92-210     :    15.8    5.2    8000    52   2007   1655    4338    54      66
   5 lc0.net.J92B-205    :    14.2    5.3    8000    52   2026   1710    4264    53      51
   6 lc0.net.J92-240     :    14.1    5.1    8000    52   1984   1670    4346    54      55
   7 lc0.net.J92-180     :    13.6    5.1    8000    52   1990   1686    4324    54      56
   8 lc0.net.J92-220     :    13.1    5.4    8000    52   2031   1739    4230    53      66
   9 lc0.net.J92-145     :    11.5    5.4    8000    52   1963   1706    4331    54      72
  10 lc0.net.J92-160     :     9.2    5.4    8000    51   1995   1789    4216    53      58
  11 lc0.net.J92-130     :     8.5    5.3    8000    51   1987   1798    4215    53      61
  12 lc0.net.J92-120     :     7.4    5.1    8000    51   1930   1764    4306    54      59
  13 lc0.net.J92-115     :     6.6    5.3    8000    51   1971   1823    4206    53      55
  14 lc0.net.J92-70      :     6.2    4.3   12000    51   2924   2716    6360    53      54
  15 lc0.net.J92-100     :     5.9    4.3   12000    51   2891   2693    6416    53      78
  16 lc0.net.J92-85      :     3.1    5.3    8000    50   1924   1854    4222    53      68
  17 lc0.net.J92-55      :     1.3    5.4    8000    50   1912   1882    4206    53      69
  18 SF11.5              :     0.0   ----  192000    49  43243  46467  102290    53      80
  19 lc0.net.J92-40      :    -2.3    5.2    8000    50   1862   1913    4225    53      85
  20 lc0.net.J92-25      :    -6.3    5.1    8000    49   1794   1934    4272    53      56
  21 lc0.net.J92-20      :    -6.9    5.2    8000    49   1790   1943    4267    53      77
  22 lc0.net.SV4300      :    -9.5    5.2    8000    49   1770   1982    4248    53      62
  23 lc0.net.SV4585      :   -10.7    5.4    8000    49   1748   1987    4265    53      84
  24 lc0.net.SV4619      :   -14.6    5.3    8000    48   1710   2035    4255    53     ---

LJ2 320x24b: training from scratch on late T60 training positions with 20 or fewer pieces.
Note: after LJ2-130 Lender net, training uses Value Repair games that use Stockfish12 to ascertain dubious Leela evaluations, so the nets are not zero.


LJ3 256x20b: training from scratch on late T60 training positions with 20 or fewer pieces.
Note: after LJ3-140 Lender net, training uses Value Repair games that use Stockfish12 to ascertain dubious Leela evaluations, so the nets are not zero.


LJ1 192x20b (retired): was being trained on late T60 training positions with 18 or fewer pieces left. "Opening" books appropriate for testing at jhorthos Opening Books Wiki (16 piece book).
Hmm, I had not realized that dkappe already trained an Ender net of (nearly) this size.

Trained on T60 positions after MLH added, with aggressive LR drops, already decent:

Test at very short time control against best T40 net, with a book of 16-pieces left "openings":

   1 lc0.net.42850     :     0.0   ----  2537.0    4000    63  1335  261  2404    60     100
   2 lc0.net.LJ1-50    :   -78.6    9.0   779.5    2000    39   149  590  1261    63     100
   3 lc0.net.LJ1-40    :  -114.9   10.1   683.5    2000    34   112  745  1143    57     ---

T60B.7 320x24b: T60 branch (there were 6 previous test runs, hence the 7). Starting with net 63990, training on T60 games from 2020-06-26 onward, at LR 0.0006 and 30k steps per 4 million game window. Nearly identical to J92 except for the net size. LR changed to 0.0004 at step 90k.

Testing so far against Stockfish, 1k nodes vs. 1.4M nodes, all default except 6-piece TB:

SF11.5 = stockfish_20061707
   # PLAYER                :  RATING  ERROR  PLAYED   (%)      W      L      D  D(%)  CFS(%)
   1 lc0.net.T60B.7-105    :    20.8    5.6    8000    53   2336   1875   3789    47      99
   2 lc0.net.T60B.7-90     :    11.7    5.5    8000    52   2255   1994   3751    47      50
   3 lc0.net.T60B.7-60     :    11.7    5.5    8000    52   2219   1959   3822    48      66
   4 lc0.net.T60B.7-45     :    10.0    5.7    8000    51   2236   2013   3751    47      92
   5 lc0.net.T60B.7-30     :     4.2    5.5    8000    51   2167   2074   3759    47      93
   6 SF11.5                :     0.0   ----   56000    49  14273  15316  26411    47      96
   7 lc0.net.T60B.7-15     :    -5.0    5.6    8000    49   2076   2188   3736    47      64
   8 lc0.net.63990         :    -6.4    5.5    8000    49   2027   2170   3803    48     ---  << parent net


J90 and J91 384x30b: with starting net sv_384x30-t60-3010, J90 is training on T60 games from 2020-03-09 onward, all at lower LR than SV is currently using. The purpose is to test whether extended training at very low LR continues to improve a large network. Currently has no moves left head; MLH will be added when the MLH training games are reached. I will post every 20k steps (4096 batch size) initially. Current LR is 0.0002, which might eventually be reduced to 0.0001 or lower. Training window is 10 million games, possibly growing to 20 million later. J90 training parameters. Fraction correct (fc) is from a 1k node position test on 209k problems, with +/-0.002 expected sampling error (2 sd).

Training will continue on J91 (J90 is stopped). There should be a new 20k step network approximately every 2 days. Changes are slow but I am fairly sure real, will post once every 50k steps after J91-100.

J90 (q_ratio=0.2):

J91 (q_ratio=0.0):


192x16b: extension of J20 training (which was trained on T40 games), trained on T60 games.

800 node match testing (J20-400 parent net):

   # PLAYER             :  RATING  ERROR  POINTS  PLAYED   (%)     W     L      D  D(%)  CFS(%)
   1 lc0.net.J64-180    :    64.5    7.4  2357.0    4000    59  1215   501   2284    57     100
   2 lc0.net.J64-140    :    49.9    7.0  2277.5    4000    57  1147   592   2261    57      58
   3 lc0.net.J64-130    :    48.8    7.2  2271.5    4000    57  1134   591   2275    57      78
   4 lc0.net.J64-150    :    44.9    6.8  2250.0    4000    56  1111   611   2278    57      70
   5 lc0.net.J64-120    :    42.2    7.2  2235.0    4000    56  1125   655   2220    56     100
   6 lc0.net.J20-400    :     0.0   ----  8609.0   20000    43  2950  5732  11318    57     ---

256x20b: J48 is a follow-along training using T60 games as they progress. It is currently on LR 0.015 and is nearly up to date with T60 games. More nets will appear as T60 progresses.


320x24b: J13B.3 is from cyclical LR training on a megawindow of lowest LR T40 games. The cycle is 20k training steps long and has a high LR of 0.0005 and a low of 0.00002, with exponential decay from high to low point. The peak LR decays slightly each cycle. Starting at step 40k nets will be posted at each LR low point (one every 20k steps). COMPLETED


320x24b: J13B.4 is from cyclical LR training on a megawindow of lowest LR T40 games. It is similar to J13B.3 except that the peak LR is much higher (0.002) and the cycle is longer (30k steps).


256x20b: T40B.1 is a continuation of T40 training with experimental training conditions. Starting net is 42770. All T40 training games from LR 0.0007 on were combined into one huge training window (about 12M games) and trained with starting LR 0.002 gradually decreasing to 0.00005. The hypothesis is that normal training on progressive windows learns positions from later games better than earlier ones, and that the megawindow approach will learn evenly (though it discards the benefit of reinforcement enjoyed in the parent T40 run). I also converted to a WDL head and did my usual partial 7-man table base rescoring, so these are possible confounders for interpretation. Nets labeled T40B.1 to indicate branch 1 of T40 training. Pure zero (all training on T40 games). Low node Elo estimates now complete, shown below net links. I removed the older nets - ping me on Discord if you want them (you don't, they aren't as good).

T40B.1 TRAINING FINISHED net 106 is the last net

800 fixed node matches for T40B.1 nets played against parent net 42770 (~best T40):

   # PLAYER              :  RATING  ERROR   POINTS  PLAYED   (%)     W      L      D
   1 lc0.net.T40B-86     :    15.3    8.0   1564.5    3000    52   638    509   1853
   2 lc0.net.T40B-94     :    14.7    8.0   1562.0    3000    52   662    538   1800
   3 lc0.net.T40B-82     :    14.6    7.9   1561.5    3000    52   652    529   1819
   4 lc0.net.T40B-106    :    14.1    7.7   1559.5    3000    52   661    542   1797
   5 lc0.net.T40B-40     :    11.4    7.8   1548.0    3000    52   636    540   1824
   6 lc0.net.T40B-78     :    10.9    7.9   1546.0    3000    52   651    559   1790
   7 lc0.net.T40B-98     :     9.7    8.1   1541.0    3000    51   644    562   1794
   8 lc0.net.T40B-102    :     7.5    7.8   1531.5    3000    51   651    588   1761
   9 lc0.net.T40B-90     :     7.1    7.7   1530.0    3000    51   621    561   1818
  .... (trimmed)
  17 lc0.net.42770       :     0.0   ----  24824.0   50800    49  9473  10625  30702


T40B.2 also used the megawindow approach of T40B.1 but with a cosine annealing cyclical LR. The method and logic is close to that described in https://towardsdatascience.com/https-medium-com-reina-wang-tw-stochastic-gradient-descent-with-restarts-5f511975163, except that the peak LR at each cycle is gradually decreased. The cycle is 12k steps in length for 8 cycles followed by 10k steps at the lowest LR. Only the final net is posted as it appears to be the best based on 800 node tests. No obvious improvement on B.1.

800 fixed node matches for best T40B.2 net against parent net 42770 (~best T40):

   # PLAYER                :  RATING  ERROR  POINTS  PLAYED   (%)    W    L     D  CFS(%)
   1 lc0.net.T40B.2-106    :    15.2    7.6  1564.0    3000    52  642  514  1844     100
   2 lc0.net.42770         :     0.0   ----  1436.0    3000    48  514  642  1844     ---

T40B.3 was similar to B.1 except a bigger training batch size. This did not appear to improve on B.1 and no nets are posted.


T40B.4 is is similar to B.1 except for a smaller training batch size. Now complete - it is possible there was some overfitting after net 200? [nota bene there was a clerical error in Elo assignments, now fixed]


192x16b: 192 filters, 16 blocks, SE (ratio 6), WDL value head, conv policy head, trained on T40 games


320x24b: 320 filters, 24 blocks, SE (ratio 10), WDL head head, conv policy head, trained on T40 games, training complete. Also known as the 'Terminator' series. Removed rarely used nets - available by request. 410 is likely the best.

last J13 net at LR 0.2: J13-50


J13B.2 attempt to improve on J13 using the same method as for T40B.4 (there is a J13B.1 but it wasn't very successful, no nets posted). Test results listed after the nets.

3000 fixed node tests against parent net:

   # PLAYER                :  RATING  ERROR   POINTS  PLAYED   (%)     W     L      D
   1 lc0.net.J13B.2-178    :    11.4    6.2   1548.0    3000    52   425   329   2246
   2 lc0.net.J13B.2-136    :    11.1    6.3   1547.0    3000    52   405   311   2284
   3 lc0.net.J13B.2-148    :    10.1    6.3   1542.5    3000    51   426   341   2233
   4 lc0.net.J13B.2-200    :    10.1    6.2   1542.5    3000    51   427   342   2231
   5 lc0.net.J13B.2-188    :     9.2    6.3   1539.0    3000    51   410   332   2258
   6 lc0.net.J13B.2-168    :     8.2    6.4   1534.5    3000    51   408   339   2253
   7 lc0.net.J13B.2-220    :     7.6    6.4   1472.0    2882    51   401   339   2142
   8 lc0.net.J13B.2-120    :     7.3    6.2   1531.0    3000    51   421   359   2220
   9 lc0.net.J13B.2-158    :     3.6    6.3   1515.0    3000    51   390   360   2250
  10 lc0.net.J13-410       :     0.0   ----  13110.5   26882    49  3052  3713  20117

256x16b: 256 filters, 16 blocks, SE (ratio 8), WDL value head, conv policy head, trained on T40 games (training on hold to favor other experiments)

Clone this wiki locally