Skip to content

Supervised Learning

dkappe edited this page Aug 18, 2018 · 10 revisions

What You Will be Able to do

After reading this article, you should be able to take a collection of PGN's and train a net using lczero-training.

Getting Ready

[Note: this article assumes you are using Linux. Performing the same on Windows is possible, but as I don't use Windows, documenting the details will have to be left to someone else.]

You will need the following software:

  1. pgn-extract - the supervised learning pgn parser is very brittle. As a starting point, I would run your pgn file through pgn-extract with pgn-extract -7 -C < input.pgn > output.pgn. See here for details.
  2. The "supervise" branch of my fork of lczero. Yes lczero is the old engine for the nets, but it also has the supervised training code in it, which I fixed and reenabled. This should also be merged into the master branch of the original repo. I don't control that, so can only be confident that my fork has the right code.
  3. The master branch of lczero-training. There's a fair bit of fiddling with setup here. You'll need CUDA-9.0 for tensorflow, et al, which is different than the CUDA-9.2, et al you got for lc0. I'll eventually add a section on configuration of this beast.

The Process

The high level supervised learning process runs as follows:

  1. Make sure the individual pgn files you will be converting to training data have less than 500k games in them. The training software expects files -- called "chunks" -- with one game per chunk. So training data directories will be created with a potentially large number of files, which can become unwieldy. You can use pgn-extract to break a pgn file into equal sized files with N games. See documentation.
  2. Clean up the pgn files with pgn-extract -7 -C < input.pgn > output.pgn. Change the filenames to reflect your naming scheme.
  3. Run lczero to generate the training data. Note that lczero requires a weights file for this step. The weights file is loaded but ignored. This is an artifact of the all in one nature of lczero.
./lczero -w weights_useless.txt.gz --supervise my_pgn_file.pgn
  1. Clean up the mess and edit your pgn file when it dumps core because of some minor pgn issue.
  2. Finally you get a clean run. You should have a directory called supervise-my_pgn_file with files of the form training.XXXXX.gz where the X's are digits (there could be 1 or a dozen digits, depending on how many games you had). There should be one file for each game in your pgn.
  3. If you've converted several pgn's, put all the various "supervise" directory in a common subdirectory. This will make it easier to process them in the training step.
  4. Determine how many "chunks" you have in the subdir by running find subdir -type f | wc -l. Let's assume we have 901265 chunks.
  5. In your lczero-training directory. Change directory to the tf subdir. There should be a "config" subdirectory. Let's copy the example config file and make it work for us.
%YAML 1.2
---
name: 'kb1-64x6'                       # ideally no spaces
gpu: 0                                 # gpu id to process on

dataset: 
  num_chunks: 100000                   # newest nof chunks to parse
  train_ratio: 0.90                    # trainingset ratio
  # For separated test and train data.
  input_train: '/path/to/chunks/*/draw/' # supports glob
  input_test: '/path/to/chunks/*/draw/'  # supports glob
  # For a one-shot run with all data in one directory.
  # input: '/path/to/chunks/*/draw/'

training:
    batch_size: 2048                   # training batch
    test_steps: 2000                   # eval test set values after this many steps
    train_avg_report_steps: 200        # training reports its average values after this many steps.
    total_steps: 140000                # terminate after these steps
    # checkpoint_steps: 10000          # optional frequency for checkpointing before finish
    shuffle_size: 524288               # size of the shuffle buffer
    lr_values:                         # list of learning rates
        - 0.02
        - 0.002
        - 0.0005
    lr_boundaries:                     # list of boundaries
        - 100000
        - 130000
    policy_loss_weight: 1.0            # weight of policy loss
    value_loss_weight: 1.0             # weight of value loss
    path: '/path/to/store/networks'    # network storage dir

model:
  filters: 64
  residual_blocks: 6
...