You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on github.com and signed with GitHub’s verified signature.
The key has expired.
Added
Added rocblas_get_version_string_size convenience function
Added rocblas_xtrmm_outofplace, an out-of-place version of rocblas_xtrmm
Added hpl and trig initialization for gemm_ex to rocblas-bench
Added source code gemm. It can be used as an alternative to Tensile for debugging and development
Added option ROCM_MATHLIBS_API_USE_HIP_COMPLEX to opt-in to use hipFloatComplex and hipDoubleComplex
Optimizations
Improved performance of non-batched and batched single-precision GER for size m > 1024. Performance enhanced by 5-10% measured on a MI100 (gfx908) GPU.
Improved performance of non-batched and batched HER for all sizes and data types. Performance enhanced by 2-17% measured on a MI100 (gfx908) GPU.
Changed
Instantiate templated rocBLAS functions to reduce size of librocblas.so
Removed static library dependency on msgpack
Removed boost dependencies for clients
Fixed
Option to install script to build only rocBLAS clients with a pre-built rocBLAS library
Correctly set output of nrm2_batched_ex and nrm2_strided_batched_ex when given bad input
Fix for dgmm with side == rocblas_side_left and a negative incx