This is the repository for a MLIR-based convolution and GEMM kernel generator
targetting AMD hardware. This generator is mainly used from
MIGraphX,
but it can be used on a standalone basis. (The ability to use this code via
torch-mlir
is being investigated as well.)
To build the system
mkdir build
cd build
cmake -G Ninja .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++
ninja check-rocmlir
Note that we require building against a relatively recent clang. The above commands specify the ROCm clang release in order to match our standard development practice.
To not actually run the tests, use check-rocmlir-build-only
.
To build the static library that is used by MIGraphX
mkdir build
cd build
cmake -G Ninja .. -DBUILD_FAT_LIBROCKCOMPILER=On -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++
ninja
and to install it so MIGraphX can find it
cmake --install . --prefix [your/MIGraphX/deps/folder/path]
For usage examples, see mlir/test/rocmlir-driver
, especiallly the files
sanity.mlir
and the contents of the e2e_for_pr
directory.
This project also includes code that translates from TOSA to kernels, see
mlir/test/fusion
for examples of how to invoke it.
In general (with all invocations given from the build directory)
./bin/rocmlir-gen
generates high-level convolution operations and host code. Many of the options control data layout, size, etc, but some other useful flags are:-mfma=on
(which enables mfma usage) (or-wmma=on
for gfx11 targets)-mfma=off
(which disables mfma usage) (or-wmma=off
for gfx11 targets)-ph
(which causes host code to be generated)-pv
(which makes the host code validtae the results against a reference)-pv_with_gpu
(which uses a GPU validator instead)-pr
(which prints kkrnel results)
./bin/rocmlir-driver
is a wrapper around the kernel generation pipeline. Use-c
(or--kernel-pipeline=full --host-pipeline=runner
) to run the default pipeline
The result of this pipeline should, most simply, be passed to the rocm-run
script in mlir/utils/widgets//rocm-run
, which calls mlir-cpu-runner
with
the appropriate flags and infers the pathnames for libraries correctly.
In more detail, the result of the above pipeline can be passed to
./external/llvm-project/llvm/bin/mlir-cpu-runner
.
mlir-cpu-runner
needs to link the generated host code against libraries that
map from MLIR operations to the HIP runtime.
The required command-line arguments (if running from build/
) are
./external/llvm-project/llvm/bin/mlir-cpu-runner --shared-libs=./external/llvm-project/llvm/lib/libmlir_rocm_runtime.so,./lib/libconv-validation-wrappers.so,./external/llvm-project/llvm/lib/libmlir_runner_utils.so --entry-point-result=void
Adding --debug-only=serialize-to-blob
to the rocmlir-driver
invocation
will cause the GCN assembly code for the kernels being executed to be dumped to
standard error.
By default, we infer the use of GPU-specific acceleration instructions, like MFMA or WMMA, based on the features of the currently available GPU.
To disable this, add -DROCMLIR_GEN_FLAGS="-mfma=off -wmma=off"
to
the cmake
invocations given above. Note that this will not affect behavior
in production/static library builds, which do not use rocmlir-gen
.