Skip to content

rocBLAS 4.2.0 for ROCm 6.2.0

Compare
Choose a tag to compare
@rocm-ci rocm-ci released this 02 Aug 16:15
54f305c

Additions

  • Level 2 functions and level 3 trsm have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
  • Cache flush timing for gemm_batched_ex, gemm_strided_batched_ex, axpy
  • Benchmark class for common timing code
  • An environment variable "ROCBLAS_DEFAULT_ATOMICS_MODE" to set default atomics mode during creation of 'rocblas_handle'
  • Extended dot_ex to support single-precision (fp32_r) input and double-precision (fp64_r) output and compute types

Optimizations

  • Improved performance of Level 1 dot_batched and dot_strided_batched for all precisions. Performance enhanced by 6 times for bigger problem sizes measured on MI210 GPU

Changes

  • Linux AOCL dependency updated to release 4.2 gcc build
  • Windows vcpkg dependencies updated to release 2024.02.14
  • Increased default device workspace from 32 to 128 MiB for architecture gfx9xx with xx >= 40

Deprecations

  • rocblas_gemm_ex3, gemm_batched_ex3 and gemm_strided_batched_ex3 are deprecated and will be removed in the next major release of rocBLAS. Please refer to hipBLASLt for future 8 bit float usage https://github.com/ROCm/hipBLASLt