-
-
Notifications
You must be signed in to change notification settings - Fork 18
Training through Kaggle GPU
If you don't have a GPU that can handle CUDA, or if you're having trouble setting up your Python environment, Kaggle could be a good option for you. The reason being that CUDA, PyTorch, and the necessary environment are already included by default, so you can easily follow the instructions to train NNUE.
Notebook: https://www.kaggle.com/fabianfichter/variant-nnue-demo
Kaggle is a crowd-sourced platform for data scientists to solve their problems. Provided by Google, they offer free GPU quota to everyone (30 hr/week), which is quite enough for NNUE training. Before you proceed, you must register a Kaggle account first.
See Kaggle Docs for further information.
- follow training data generation to generate enough data.
- fork this repository
- change the code by the output of the data generator.
2. open the notebook and click Edit on the top.
you should see it under Input
in my case , it would be
!cd variant-nnue-pytorch && python train.py --gpus 1 --max_epochs 10 location_of_your_data location_of_your_data
in my case , it's
!cd variant-nnue-pytorch && python train.py --gpus 1 --max_epochs 10 /kaggle/input/xiangqi-data/xiangqi_data.bin /kaggle/input/xiangqi-data/xiangqi_data.bin
you can copy the location by simply clicking on the copy botton next to your training data
for more information about the parameters, see https://github.com/fairy-stockfish/variant-nnue-pytorch/wiki/NNUE-training#training-example.
After it finishes, you may add this commamd to see whether it has successfully stored the checkpoint files.
!cd variant-nnue-pytorch/logs/default/version_0/checkpoints && ls
If it contains files like checkpoint_0.ckpt
or last.ckpt
, then the training is complete. You may want to export the files into NNUE files:
!cd variant-nnue-pytorch && python serialize.py --features HalfKAv2^ /kaggle/working/variant-nnue-pytorch/logs/default/version_0/checkpoints/last.ckpt Your_NNUE.nnue
Note that the name of the nnue file should follow the naming rule.
click on the Save version
button on the top, then select Quick Save
Exit and go back to the homepage of the notebook. Click on Data then you'll see the output. You may download the whole file by clicking on the triple spots.
the .nnue
file will locate in the variant-nnue-pytorch
directory.
Kaggle will reset every 12 hr, so you'll have to control the time and save before it reset. Note that all the thing will be cleared out except those files you save.
2. I copy the whole notebook and follow all the commands, but there's runtime error during the training process
Kaggle will update it's packages once in a while. Make sure you change the environment to the same as the notebook's.
also, make sure you've turned on the GPU.