Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onednn backend #1558

Merged
merged 32 commits into from
Jun 16, 2021
Merged

Onednn backend #1558

merged 32 commits into from
Jun 16, 2021

Conversation

borg323
Copy link
Member

@borg323 borg323 commented May 11, 2021

This is the long awaited onednn backend (formerly dnnl, formerly mkl-dnn). To build pass -Donednn=true -Ddnnl_dir=path/to/dnnl/library to meson. This works with both cpu and (intel) gpu. For gpu you will need to build the dnnl library yourself (or ask me for a dll).

For best performance use just one search thread when running on a cpu as a second search thread interferes with the onednn computing threads (onednn on a cpu uses all available cores by default).

There are several backend options, the most important are:

option values default comment
gpu empty or integer empty select gpu to use, empty for cpu
winograd empty, true or false empty Set to true to use Winograd 3x3 convolution on cpu, false for direct convolution, empty to let the library chose. Currently dnnl after v2.0.0 may get this wrong, and is not supported on all processors so may exit with a not very informative error.
fp16 true or false true on gpu, false on cpu Use fp16 (or bf16 if you cpu supports it) for computation.
batch integer 64 for fp16, 32 for fp32 The minimum batch size to use, as the library prepares kernels on startup. Set to 0 for dynamic kernel recompilation on every batch (not recommended).
steps integer 2 Prepare kernels for this many multiples of batch size.
threads empty or integer empty Number of cpu threads to use, empty to let the library decide.

@borg323
Copy link
Member Author

borg323 commented May 11, 2021

This is dnnl 1.8.0 compiled with both gpu and cpu support: dnnl.dll.zip

Example command line for benchmark:
./lc0 benchmark -w 703810.pb.gz --threads=1 --backend=onednn --backend-opts=gpu=0

@borg323 borg323 merged commit 6b1f83e into LeelaChessZero:master Jun 16, 2021
@borg323 borg323 deleted the onednn_back branch June 16, 2021 11:42
@aochoam
Copy link

aochoam commented Mar 12, 2022

@borg323 Result of the Sanity checking the dx12 driver.
_
| _ | |
|_ |_ |_| v0.28.2 built Dec 13 2021
Detected 4 core(s) and 8 thread(s) in 1 group(s).
Group 0 has 4 core(s) and 8 thread(s).
Found pb network file: C:\Arena\Engines\lc0_dx12/771479.pb.gz
Creating backend [check]...
Working backend set to dx12.
Reference backend set to eigen.
Creating backend [dx12]...
Creating backend [eigen]...
Using Eigen version 3.3.7
Eigen max batch size is 256.
Check mode: check only with relative tolerance 1.0e-04, absolute tolerance 5.0e-01.
Check rate: 100%.

Position: 1/1 rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
*** ERROR check failed for a batch of 1 policy incorrect (but value ok).
*** ERROR check failed for a batch of 19 policy incorrect (but value ok).
Benchmark time 6218 ms, 20 nodes, 21 nps, move b1c3
*** ERROR check failed for a batch of 20 policy incorrect (but value ok).
Benchmark time 6260 ms, 21 nodes, 21 nps, move b1a3
Benchmark time 10007 ms, 21 nodes, 4 nps, move b1a3
bestmove b1a3
*** ERROR check failed for a batch of 144 both value and policy incorrect.
*** ERROR check failed for a batch of 256 both value and policy incorrect.

===========================
Total time (ms) : 18655
Nodes searched : 421
Nodes/second : 23
Presione una tecla para continuar . . .

@aochoam
Copy link

aochoam commented Mar 12, 2022

This version does not work for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants