diff --git a/INSTALL.rst b/INSTALL.rst index 0c498b90c..6569e516f 100644 --- a/INSTALL.rst +++ b/INSTALL.rst @@ -26,8 +26,9 @@ You are of course also free to use your own Python installation, and the Kernel Installing Python Packages -------------------------- -Note that when you are using a native Python installation, the `pip` command used -Kernel Tuner and its dependencies require `sudo` rights for system wide installation. +Note that when you are using a native Python installation, the `pip` command used +to install +Kernel Tuner and its dependencies requires `sudo` rights for system wide installation. Sudo rights are typically not required when using Miniconda or virtual environments. You could also use e.g. the `--user` or `--prefix` option of `pip` to install into @@ -83,6 +84,15 @@ from an installation that is failing. If this fails, I recommend to see the PyCuda installation guide (https://wiki.tiker.net/PyCuda/Installation) +Other CUDA Backends +------------------- + +Kernel Tuner can also be used with CuPy (https://cupy.dev/) or Nvidia's CUDA Python bindings (https://nvidia.github.io/cuda-python/). Please see the installation instructions of those projects for how the required Python packages. + +Please refer to the documentation on `backends `__ on how to use and select these backends. + + + OpenCL and PyOpenCL ------------------- diff --git a/doc/source/backends.rst b/doc/source/backends.rst new file mode 100644 index 000000000..983174f62 --- /dev/null +++ b/doc/source/backends.rst @@ -0,0 +1,65 @@ +.. toctree:: + :maxdepth: 2 + + +Backends +======== + +Kernel Tuner implements multiple backends for CUDA, one for OpenCL, one for HIP, and a generic +Compiler backend. + +Selecting a backend is in most cases automatic and is done based on the kernel's programming +language, but sometimes you'll want to specifically choose a backend. + + +CUDA Backends +------------- + +PyCUDA is default CUDA backend in Kernel Tuner. It is comparable in feature completeness with CuPy. +Because the HIP kernel language is identical to the CUDA kernel language, HIP is included here as well. +To use HIP on nvidia GPUs, see https://github.com/jatinx/hip-on-nv. + +While the PyCUDA backend expects all inputs and outputs to be Numpy arrays, the CuPy backend also +supports cupy arrays as input and output arguments for the kernels. This gives the user more control +over how memory is handled by Kernel Tuner. Also checks during output verification can happen +entirely on the GPU when using only cupy arrays. + +Texture memory is only supported by the PyCUDA backend, while the CuPy backend is the only one that +support C++ signatures for the kernels. With the other backends, it is required that the kernel has +extern "C" linkage. If not, the entire code is wrapped in an extern "C" block, which may cause issues +if the code also contains C++ code that cannot have extern "C" linkage, including code that may be +present in header files. + +As detailed further :ref:`templates`, templated kernels are fully supported by the CuPy backend and +limited support is implemented by Kernel Tuner to support templated kernels for the PyCUDA and +CUDA-Python backends. + + +.. csv-table:: Backend feature support + :header: Feature, PyCUDA, CuPy, CUDA-Python, HIP + :widths: auto + + Compile kernels, ✓, ✓, ✓, ✓ + Benchmark kernels, ✓, ✓, ✓, ✓ + Observers, ✓, ✓, ✓, ✓ + Constant memory, ✓, ✓, ✓, ✓ + Dynamic shared memory, ✓, ✓, ✓, ✓ + Texture memory, ✓, ✗, ✗, ✗ + C++ kernel signature, ✗, ✓, ✗, ✗ + Templated kernels, ✓, ✓, ✓, ✗ + + +Another important difference between the different backends is the compiler that is used. The table +below lists which Python package is required, how the backend can be selected and which compiler is +used to compile the kernels. + + +.. csv-table:: Backend usage and compiler + :header: Feature, PyCUDA, CuPy, CUDA-Python, HIP + :widths: auto + + Python package, "pycuda", "cupy", "cuda-python", "pyhip-interface" + Selected with lang=, "CUDA", "CUPY", "NVCUDA", "HIP" + Compiler used, "nvcc", "nvrtc", "nvrtc", "hiprtc" + + diff --git a/doc/source/contents.rst b/doc/source/contents.rst index 4d47c3455..e140621d6 100644 --- a/doc/source/contents.rst +++ b/doc/source/contents.rst @@ -27,6 +27,7 @@ The Kernel Tuner documentation :maxdepth: 1 :caption: Features + backends cache_files correctness hostcode