Skip to content

rocBLAS 3.1.0 for ROCm 5.7.0

Compare
Choose a tag to compare
@rocm-ci rocm-ci released this 15 Sep 17:29
b80e422

Added

  • yaml lock step argument scanning for rocblas-bench and rocblas-test clients. See Programmers Guide for details.
  • rocblas-gemm-tune is used to find the best performing GEMM kernel for each of a given set of GEMM problems.

Fixed

  • make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimensions or increments potentially causing overflow:
    • Level 1: axpy, copy, rot, rotm, scal, swap, asum, dot, iamax, iamin, nrm2
    • Level 2: gemv, symv, hemv, trmv, ger, syr, her, syr2, her2, trsv
    • Level 3: gemm, symm, hemm, trmm, syrk, herk, syr2k, her2k, syrkx, herkx, trsm, trtri, dgmm, geam
    • General: set_vector, get_vector, set_matrix, get_matrix
    • Related fixes: internal scalar loads with > 32bit offsets
    • fix in-place functionality for all trtri sizes

Changed

  • dot when using rocblas_pointer_mode_host is now synchronous to match legacy BLAS as it stores results in host memory
  • enhanced reporting of installation issues caused by runtime libraries (Tensile)
  • standardized internal rocblas C++ interface across most functions

Deprecated

  • Removal of STDC_WANT_IEC_60559_TYPES_EXT define in future release

Dependencies

  • optional use of AOCL BLIS 4.0 on Linux for clients
  • optional build tool only dependency on python psutil