int8_t gemm benchmark between Eigen, kpu's intgemm, dnnl, fbgemm, and mkl.
mkdir build
cd build
cmake -DWITH_MKL=OFF ..
make -j
If you have Intel MKL installed on your system, you can set -DWITH_MKL=ON
during the CMake configuration.
./benchmark [iterations=1000] [arch=any] [use_eigen=0]
Some paramters are hardcoded
- The memory alignment can be changed here: https://github.com/XapaJIaMnu/gemmBench/blob/master/bench.cpp#L338
- Matrix sizes can be changed here: https://github.com/XapaJIaMnu/gemmBench/blob/master/bench.cpp#L372
- For intgemm to work, you need M and N to be a multiple of 8 and K to be a multiple of 32
- The number of iterations of the loop can be varied through command one.
- You can limit
arch
forintgemm
anddnnl
. Supported values:ssse3
,avx2
,avx512
,avx512vnni
andany
- Since
Eigen
is a lot slower than the other two, its execution is disabled by default. To enable it, provide the argument.
- Fbgemm only supports AVX2 processors or newer, so the test is skipped on older architectures.
- Fbgemm doesn't allow for limiting the arch type, so the test is skipped in case explicit arch is requested