Skip to content
Cory Bloor edited this page Jul 18, 2020 · 43 revisions

Building rocBLAS

To build rocblas library and clients, see 1.Build.

Example code

For example code and Makefile using rocBLAS see 2.Example.

Running client executables

For instructions on how to run/use the client code, see 3.Running.

Functionality, API

rocBLAS exports the functions listed in 4.Exported functions.

Logging functions

You can set Environment variables that cause rocBLAS to output logging information for each rocBLAS call. Note that output is streamed to standard error, and logging should only be used for diagnostics as it will slow down the code.

Contributing

See 6.Contributing for guidelines on contributing to rocBLAS.

Training

rocBLAS uses the code Tensile for gemm functions. Tensile can be tuned for specific gemm sizes. There is default training in rocBLAS, and most users will not need to ever train. Information on training is in 7.Train.

Device and stream management in rocBLAS and HIP

For information on Device and Stream management, see the section 8.Device and stream management.

Numerical stability of TRSM

TRSM has division, and the triangle matrices may be ill-conditioned. For more information see 9.Numerical stability in TRSM.

Profile rocBLAS kernels

Some environment variables that can be set to profile are described in the section x0.Profile rocBLAS kernels

Clone this wiki locally