Skip to content
amcamd edited this page Jul 24, 2018 · 43 revisions

Building rocBLAS

To build rocblas library and clients, see 1.Build.

Example code

For example code and Makefile using rocBLAS see 2.Example.

Running client executables

For instructions on how to run/use the client code, see 3.Running.

Functionality, API

rocBLAS exports the functions listed in 4.Exported functions.

Logging functions

See 5.Logging to set Environment variables that cause rocBLAS to output logging information for each rocBLAS call. Note that output is streamed to standard error, and logging should only be used for diagnostics as it will slow down the code.

Training

rocBLAS uses the code Tensile for gemm functions. Tensile can be tuned for specific gemm sizes. There is default training in rocBLAS, and most users will not need to ever train. Information on training is in 6.Train.

Device and stream management in rocBLAS and HIP

For information on Device and Stream management, see the section 7.Device and stream management.

Numerical stability of TRSM

TRSM has division, and the triangle matrices may be ill-conditioned. For more information see 8.Numerical stability in TRSM.

Profile rocBLAS kernels

Some environment variables that can be set to profile are described in the section 9.Profile rocBLAS kernels

Clone this wiki locally