Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of CoreML Backend into Leela Chess Zero #1950

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

ChinChangYang
Copy link
Contributor

@ChinChangYang ChinChangYang commented Feb 13, 2024

Overview
This pull request introduces a CoreML backend for Leela Chess Zero (lc0), capitalizing on Apple's CoreML framework to significantly enhance neural network computations on macOS devices. By integrating this backend, lc0 gains considerable performance optimizations and expanded computational capabilities tailored for Apple hardware.

Implementation Highlights

  • CoreML Backend Integration: Initiated with the creation of a coreml_backend namespace, this phase includes the setup of essential CoreML header and source files, alongside the initialization of MLModel within this namespace.
  • CoreMLModel Class: This newly developed class encapsulates the logic for CoreML model storage and retrieval, integrating tailored methods for handling inputs/outputs and predictions, thereby aligning with lc0's computational needs.
  • Model Initialization Enhancements: The model initialization process has been refined to utilize MLComputeUnits across CPU, GPU, and Apple's Neural Engine, ensuring optimal performance.
  • Comprehensive macOS Support: This backend brings full-fledged support to macOS, incorporating all necessary frameworks and file inclusions to harness CoreML's capabilities on Apple devices efficiently.
  • Consistent Code Styling: The project's .clang-format has been updated to ensure uniform coding styles across C++ and Objective-C++ languages, maintaining code clarity and consistency.

Prerequisites and Setup
Prior to the integration, it is imperative to build coremltools from a specific pull request (apple/coremltools#2087) pending the release of coremltools 7.2. The conversion of networks into CoreML models necessitates the use of net_to_coreml.py from the lczero-training repository, following the instructions delineated in LeelaChessZero/lczero-training#222.

Converting Networks to CoreML Models:

  1. Create a Python environment and install necessary packages:
conda create -n net-to-coreml-py39 python=3.9
conda activate net-to-coreml-py39
pip install numpy tensorflow tensorflow-metal protobuf==3.20.3 pyyaml coremltools
  1. Clone the lczero-training repository and prepare the environment:
git clone --recurse-submodules https://github.com/LeelaChessZero/lczero-training.git
cd lczero-training
git fetch origin pull/222/head:net-to-coreml
git switch net-to-coreml
./init.sh
  1. Download the network and configuration, then convert to CoreML model:
cd tf
wget -O 817580.lc0 "https://training.lczero.org/get_network?sha=7658338877bcf498b3329b9c196abfb123659dc16a5db298a2378cc4a5bb25ba"
wget https://gist.githubusercontent.com/ChinChangYang/948bc4f9114dfb512abff8e3b2392962/raw/7400327b759af7703aaa0052e0e837ebc25e1cc8/768x15x24h-t80.yaml
python net_to_coreml.py --cfg 768x15x24h-t80.yaml -e 817580.lc0
  1. Transfer the CoreML model and network to lc0's directory:
cp -r dev1/networks/768x15x24h-t80/817580.lc0.mlpackage /path/to/lc0/build/release/lc0.mlpackage
cp 817580.lc0 /path/to/lc0/build/release/

Benchmarking lc0 with CoreML Backend:
Navigate to lc0's release directory and execute the benchmark command to evaluate performance:

% cd /path/to/lc0/build/release/
% ./lc0 benchmark -b coreml
[...]
===========================
Total time (ms) : 345668
Nodes searched  : 95896
Nodes/second    : 277

Contribution and Review Request
This pull request seeks a comprehensive review of the CoreML backend's integration, focusing on its compatibility, performance enhancements, and alignment with lc0's architectural standards. Feedback, suggestions, and further optimizations are highly encouraged to ensure this significant feature's robust integration into lc0, fostering a seamless user experience on macOS platforms.

- Create coreml_backend namespace.
- Create CoreML header file.
- Create CoreML Objective-C++ source file.
- Create CoreML constructor.
- Compile and initialize MLModel in the CoreML constructor.
Introduce CoreMLModel class to encapsulate MLModel storage and retrieval.
Let CoreMLModel to accommodate multiple outputs for model predictions.
Improve CoreML model initialization and inference by setting MLComputeUnits to MLComputeUnitsCPUAndGPU for enhanced accuracy during forward evaluation. Update the forwardEval method to properly handle input data and log CoreML output.
Consolidated input array setup, added error checking and post-prediction calculations for output values, and output moves left in CoreML forward evaluation.
Added CoreML backend support with necessary frameworks check and file inclusion for the macOS build. This enhancement enables using CoreML for neural network computations, expanding functionality.
Extracts input setup and prediction functions to enhance modularity and readability.
Add Objective-C style configuration to .clang-format for consistent code formatting across different languages. This update ensures proper styling for Objective-C code, maintaining uniformity in the codebase.
Modified CoreML constructor to accept boolean parameters for WDL and moves_left settings, enhancing configurability and flexibility in model instantiation. This change improves the compatibility of the CoreML component.
Refactored the CoreML code to use exception throwing for error handling instead of returning null, in order to improve error reporting and handling. This change also includes signaling the semaphore after the model initialization task is complete. This commit addresses the need for more robust error handling and better signaling of task completion in the CoreML module.
@ChinChangYang
Copy link
Contributor Author

Absolute error histogram

% ./lc0 benchmark -b check --backend-opts=mode=histo,coreml --num-positions=1

       _
|   _ | |
|_ |_ |_| v0.31.0-dev+git.dirty built Feb  6 2024
Found pb network file: ./weights_run1_817374.lc0
Creating backend [check]...
Working backend set to coreml.
Reference backend set to eigen.
Creating backend [coreml]...
2024-02-06 21:56:53.101 lc0[73824:20798742] Compiling model: lc0.mlpackage/ -- file:///Users/chinchangyang/Code/lc0-ccy/build/release/
2024-02-06 21:56:53.317 lc0[73824:20798766] Compiled model URL: file:///var/folders/dv/kdr9x4yn4s106_94ydk5jnjc0000gn/T/lc0_D13B185D-B44F-4F09-9DE6-F447BE2294CB.mlmodelc
2024-02-06 21:56:53.317 lc0[73824:20798766] Initializing model with the compiled model URL...
2024-02-06 21:56:59.723 lc0[73824:20798766] Model successfully initialized
Creating backend [eigen]...
Using Eigen version 3.4.0
Eigen max batch size is 256.
Check mode: histogram.
Check rate: 20%.

Position: 1/1 rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Benchmark time 61 ms, 2 nodes, 36 nps, move g1f3
Benchmark time 114 ms, 8 nodes, 74 nps, move d2d4
Benchmark time 203 ms, 9 nodes, 45 nps, move d2d4
Benchmark time 364 ms, 15 nodes, 41 nps, move d2d4
Benchmark time 382 ms, 19 nodes, 50 nps, move g1f3
Benchmark time 413 ms, 22 nodes, 54 nps, move g1f3
Benchmark time 502 ms, 40 nodes, 80 nps, move e2e3
Benchmark time 628 ms, 66 nodes, 106 nps, move g1f3
Benchmark time 652 ms, 102 nodes, 157 nps, move g1f3
Benchmark time 943 ms, 187 nodes, 199 nps, move g1f3
Benchmark time 1002 ms, 206 nodes, 207 nps, move e2e4
Absolute error histogram for a batch of 14
      |                                                                                         |
      |                                                                   #                     |
      |                                                                   ##                    |
 0.15 +                                                                   ##                    +
      |                                                                   ##                    |
      |                                                                   ###                   |
      |                                                                   ###                   |
      |                                                                  ####                   |
  0.1 +                                                                  ####                   +
      |                                                                 #####                   |
      |                                                                 #####                   |
      |                                                                 #######                 |
      |                                                                 #######                 |
 0.05 +                                                                 ####### ##              +
      |                                                                 ##########              |
      |                                                               # ############            |
      |                                                               ##############            |
      |                                                           ####################          |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 
Absolute error histogram for a batch of 32
      |                                                                                         |
 0.15 +                                                                   ##                    +
      |                                                                   ##                    |
      |                                                                   ##                    |
      |                                                                  ####                   |
      |                                                                  ####                   |
  0.1 +                                                                  ####                   +
      |                                                                 ######                  |
      |                                                                 ######                  |
      |                                                                 #######                 |
      |                                                                 #######                 |
 0.05 +                                                                ########                 +
      |                                                                #########                |
      |                                                                ########## ###           |
      |                                                              ################           |
      |                                                       #########################         |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 
Benchmark time 5088 ms, 262 nodes, 51 nps, move g1f3
Benchmark time 5315 ms, 345 nodes, 64 nps, move c2c4
Benchmark time 5451 ms, 452 nodes, 83 nps, move d2d4
Benchmark time 5835 ms, 628 nodes, 107 nps, move d2d4
Absolute error histogram for a batch of 32
 0.15 +                                                                                         +
      |                                                                   ##                    |
      |                                                                  ###                    |
      |                                                                  ###                    |
      |                                                                 ####                    |
  0.1 +                                                                 #####                   +
      |                                                                 ######                  |
      |                                                                 ######                  |
      |                                                                 ######                  |
      |                                                                #######                  |
 0.05 +                                                                #########                +
      |                                                                ######### #              |
      |                                                               #############             |
      |                                                             # ###############           |
      |                                                         ######################          |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 
Benchmark time 10005 ms, 883 nodes, 88 nps, move d2d4
bestmove d2d4
Absolute error histogram for a batch of 32
      |                                                                                         |
      |                                                                   #                     |
 0.15 +                                                                   #                     +
      |                                                                   ##                    |
      |                                                                   ##                    |
      |                                                                  ###                    |
      |                                                                  ####                   |
  0.1 +                                                                  ####                   +
      |                                                                  #####                  |
      |                                                                  #####                  |
      |                                                                 ######                  |
      |                                                                 #######                 |
 0.05 +                                                                 #######                 +
      |                                                                 ######## #              |
      |                                                               ##############            |
      |                                                             #################           |
      |                                                 ##     #######################          |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 
Absolute error histogram for a batch of 73
  0.2 +                                                                                         +
      |                                                                    #                    |
      |                                                                    #                    |
      |                                                                    #                    |
      |                                                                    #                    |
 0.15 +                                                                    #                    +
      |                                                                   ##                    |
      |                                                                   ###                   |
      |                                                                   ###                   |
      |                                                                   ###                   |
  0.1 +                                                                   ###                   +
      |                                                                  #####                  |
      |                                                                  #####                  |
      |                                                                  ######                 |
      |                                                                  ######                 |
 0.05 +                                                                 #######                 +
      |                                                                 ########                |
      |                                                               ##############            |
      |                                                            ##################           |
      |                                                     ## #######################          |
      +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   -inf  -15  -14  -13  -12  -11  -10   -9   -8   -7   -6   -5   -4   -3   -2   -1    0    1  +inf 

===========================
Total time (ms) : 15160
Nodes searched  : 985
Nodes/second    : 65

@ChinChangYang
Copy link
Contributor Author

Backendbench

% ./lc0 backendbench -b coreml --max-batch-size=32                           

       _
|   _ | |
|_ |_ |_| v0.31.0-dev+git.dirty built Feb  6 2024
Found pb network file: ./weights_run1_817374.lc0
Creating backend [coreml]...
2024-02-06 21:58:45.937 lc0[73848:20800526] Compiling model: lc0.mlpackage/ -- file:///Users/chinchangyang/Code/lc0-ccy/build/release/
2024-02-06 21:58:46.152 lc0[73848:20800532] Compiled model URL: file:///var/folders/dv/kdr9x4yn4s106_94ydk5jnjc0000gn/T/lc0_6C49057C-F712-4EA4-82DA-025C0114D284.mlmodelc
2024-02-06 21:58:46.152 lc0[73848:20800532] Initializing model with the compiled model URL...
2024-02-06 21:58:52.523 lc0[73848:20800532] Model successfully initialized
Benchmark batch size 1 with inference average time 3.219ms - throughput 310.656 nps.
Benchmark batch size 2 with inference average time 5.9926ms - throughput 333.745 nps.
Benchmark batch size 3 with inference average time 8.73419ms - throughput 343.478 nps.
Benchmark batch size 4 with inference average time 11.4712ms - throughput 348.699 nps.
Benchmark batch size 5 with inference average time 14.4862ms - throughput 345.156 nps.
Benchmark batch size 6 with inference average time 17.2667ms - throughput 347.489 nps.
Benchmark batch size 7 with inference average time 20.1971ms - throughput 346.585 nps.
Benchmark batch size 8 with inference average time 22.8665ms - throughput 349.856 nps.
Benchmark batch size 9 with inference average time 25.7457ms - throughput 349.572 nps.
Benchmark batch size 10 with inference average time 28.3601ms - throughput 352.608 nps.
Benchmark batch size 11 with inference average time 31.1257ms - throughput 353.406 nps.
Benchmark batch size 12 with inference average time 34.1467ms - throughput 351.425 nps.
Benchmark batch size 13 with inference average time 36.6708ms - throughput 354.506 nps.
Benchmark batch size 14 with inference average time 39.3265ms - throughput 355.994 nps.
Benchmark batch size 15 with inference average time 42.7149ms - throughput 351.165 nps.
Benchmark batch size 16 with inference average time 45.0219ms - throughput 355.383 nps.
Benchmark batch size 17 with inference average time 47.9304ms - throughput 354.681 nps.
Benchmark batch size 18 with inference average time 50.6393ms - throughput 355.455 nps.
Benchmark batch size 19 with inference average time 53.6795ms - throughput 353.953 nps.
Benchmark batch size 20 with inference average time 56.2379ms - throughput 355.632 nps.
Benchmark batch size 21 with inference average time 58.8418ms - throughput 356.889 nps.
Benchmark batch size 22 with inference average time 62.2577ms - throughput 353.37 nps.
Benchmark batch size 23 with inference average time 64.3085ms - throughput 357.651 nps.
Benchmark batch size 24 with inference average time 66.891ms - throughput 358.792 nps.
Benchmark batch size 25 with inference average time 69.6432ms - throughput 358.973 nps.
Benchmark batch size 26 with inference average time 72.4245ms - throughput 358.995 nps.
Benchmark batch size 27 with inference average time 75.6138ms - throughput 357.078 nps.
Benchmark batch size 28 with inference average time 78.1265ms - throughput 358.393 nps.
Benchmark batch size 29 with inference average time 80.7182ms - throughput 359.275 nps.
Benchmark batch size 30 with inference average time 83.8129ms - throughput 357.94 nps.
Benchmark batch size 31 with inference average time 86.329ms - throughput 359.091 nps.
Benchmark batch size 32 with inference average time 89.3303ms - throughput 358.221 nps.

@ChinChangYang
Copy link
Contributor Author

During the compilation process documented in Mac (5438) - LeelaChessZero/lc0, an error was encountered due to the compileModelAtURL:completionHandler: method from the Core ML framework being unavailable. This method is only accessible in macOS 13.0 or later, yet the current environment operates under macOS 12.3.1 with Xcode 13.4, as detailed in the supported Xcode versions. To rectify this issue, it is recommended to update the Xcode version specified in the config.yml file to 14.3.1 or later, which would inherently upgrade the macOS environment to meet the necessary version requirement for the method in question, thus potentially resolving the compilation error.

To resolve a compilation error in LeelaChessZero/lc0 due to the unavailability of a Core ML framework method in macOS 12.3.1, the Xcode version in config.yml is updated to 14.3.1. This ensures compatibility with macOS 13.0 or later, potentially resolving the error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant