hipBLAS is a Basic Linear Algebra Subprograms (BLAS) marshalling library with multiple supported backends. It sits between your application and a 'worker' BLAS library, where it marshals inputs to the backend library and marshals results to your application. hipBLAS exports an interface that doesn't require the client to change, regardless of the chosen backend. Currently, hipBLAS supports rocBLAS and cuBLAS backends.
To use hipBLAS, you must first install rocBLAS, rocSPARSE, and rocSOLVER or cuBLAS.
Documentation for hipBLAS is available at https://rocm.docs.amd.com/projects/hipBLAS/en/latest/.
To build our documentation locally, use the following code:
cd docs
pip3 install -r sphinx/requirements.txt
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
Alternatively, build with CMake:
cmake -DBUILD_DOCS=ON ...
-
Download the hipBLAS source code (clone this repository):
git clone https://github.com/ROCmSoftwarePlatform/hipBLAS.git
hipBLAS requires specific versions of rocBLAS and rocSOLVER. Refer to [CMakeLists.txt](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/library/CMakeLists.txt) for details.
-
Build hipBLAS and install it into
/opt/rocm/hipblas
:cd hipblas ./install.sh -i
The hipBLAS interface is compatible with rocBLAS and cuBLAS-v2 APIs. Porting a CUDA application that originally calls the cuBLAS API to an application that calls the hipBLAS API is relatively straightforward. For example, the hipBLAS SGEMV interface is:
hipblasStatus_t
hipblasSgemv( hipblasHandle_t handle,
hipblasOperation_t trans,
int m, int n, const float *alpha,
const float *A, int lda,
const float *x, int incx, const float *beta,
float *y, int incy );
hipBLAS GEMM can process matrices in batches with regular strides by using the strided-batched version of the API:
hipblasStatus_t
hipblasSgemmStridedBatched( hipblasHandle_t handle,
hipblasOperation_t transa, hipblasOperation_t transb,
int m, int n, int k, const float *alpha,
const float *A, int lda, long long bsa,
const float *B, int ldb, long long bsb, const float *beta,
float *C, int ldc, long long bsc,
int batchCount);
hipBLAS assumes matrix A and vectors x, y are allocated in GPU memory space filled with data. You are responsible for copying data to and from the host and device memory.