Skip to content

Releases: ROCm/rocBLAS

rocBLAS 3.0.0 for ROCm 5.6.1

29 Aug 20:12
4b0751e
Compare
Choose a tag to compare

rocBLAS code for ROCm 5.6.1 did not change. The library was rebuilt for the updated ROCm 5.6.1 stack.

rocBLAS 3.0.0 for ROCm 5.6.0

28 Jun 23:17
4b0751e
Compare
Choose a tag to compare

Optimizations

  • Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n <= 32 and batch_count >= 256.
  • Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.

Added

  • Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.

Deprecated

  • trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
  • rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
  • rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
  • rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
  • rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release

Removed

  • is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
  • The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
  • rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
  • rocblas_get_int8_type_for_hipblas was deprecated and is now removed.

Dependencies

  • build only dependency on python joblib added as used by Tensile build
  • fix for cmake install on some OS when performed by install.sh -d --cmake_install

Fixed

  • make trsm offset calculations 64 bit safe

Changed

  • refactor rotg test code

rocBLAS 2.47.0 for ROCm 5.5.1

24 May 19:06
cdd561f
Compare
Choose a tag to compare

rocBLAS code for ROCm 5.5.1 did not change. The library was rebuilt for the updated ROCm 5.5.1 stack.

rocBLAS 2.47.0 for ROCm 5.5.0

01 May 21:04
cdd561f
Compare
Choose a tag to compare

Added

  • added functionality rocblas_geam_ex for matrix-matrix minimum operations
  • added HIP Graph support as beta feature for rocBLAS Level 1, Level 2, and Level 3(pointer mode host) functions
  • added beta features API. Exposed using compiler define ROCBLAS_BETA_FEATURES_API
  • added support for vector initialization in the rocBLAS test framework with negative increments
  • added windows build documentation for forthcoming support using ROCm HIP SDK
  • added scripts to plot performance for multiple functions

Optimizations

  • improved performance of Level 2 rocBLAS GEMV for float and double precision. Performance enhanced by 150-200% for certain problem sizes when (m==n) measured on a gfx90a GPU.
  • improved performance of Level 2 rocBLAS GER for float, double and complex float precisions. Performance enhanced by 5-7% for certain problem sizes measured on a gfx90a GPU.
  • improved performance of Level 2 rocBLAS SYMV for float and double precisions. Performance enhanced by 120-150% for certain problem sizes measured on both gfx908 and gfx90a GPUs.

Fixed

  • fixed setting of executable mode on client script rocblas_gentest.py to avoid potential permission errors with clients rocblas-test and rocblas-bench
  • fixed deprecated API compatibility with Visual Studio compiler
  • fixed test framework memory exception handling for Level 2 functions when the host memory allocation exceeds the available memory

Changed

  • install.sh internally runs rmake.py (also used on windows) and rmake.py may be used directly by developers on linux (use --help)
  • rocblas client executables all now begin with rocblas- prefix

Removed

  • install.sh removed options -o --cov as now Tensile will use the default COV format, set by cmake define Tensile_CODE_OBJECT_VERSION=default

rocBLAS 2.46.0 for ROCm 5.4.4

22 Mar 20:47
24f3891
Compare
Choose a tag to compare

rocBLAS code for ROCm 5.4.4 did not change. The library was rebuilt for the updated ROCm 5.4.4 stack.

rocBLAS 2.46.0 for ROCm 5.4.3

07 Feb 17:39
24f3891
Compare
Choose a tag to compare

rocBLAS code for ROCm 5.4.3 did not change. The library was rebuilt for the updated ROCm 5.4.3 stack.

rocBLAS 2.46.0 for ROCm 5.4.2

13 Jan 16:42
ef7a9bb
Compare
Choose a tag to compare

rocBLAS code for ROCm 5.4.2 did not change. The library was rebuilt for the updated ROCm 5.4.2 stack.

rocBLAS 2.46.0 for ROCm 5.4.1

15 Dec 18:39
ef7a9bb
Compare
Choose a tag to compare

rocBLAS code for ROCm 5.4.1 did not change. The library was rebuilt for the updated ROCm 5.4.1 stack.

rocBLAS 2.46.0 for ROCm 5.4.0

30 Nov 17:36
ef7a9bb
Compare
Choose a tag to compare

Added

  • client smoke test dataset added for quick validation using command rocblas-test --yaml rocblas_smoke.yaml
  • Added stream order device memory allocation as a non-default beta option.

Optimized

  • Improved trsm performance for small sizes by using a substitution method technique
  • Improved syr2k and her2k performance significantly by using a block-recursive algorithm

Changed

  • Level 2, Level 1, and Extension functions: argument checking when the handle is set to rocblas_pointer_mode_host now returns the status of rocblas_status_invalid_pointer only for pointers that must be dereferenced based on the alpha and beta argument values. With handle mode rocblas_pointer_mode_device only pointers that are always dereferenced regardless of alpha and beta values are checked and so may lead to a return status of rocblas_status_invalid_pointer. This improves consistency with legacy BLAS behaviour.
  • Add variable to turn on/off ieee16/ieee32 tests for mixed precision gemm
  • Allow hipBLAS to select int8 datatype
  • Disallow B == C && ldb != ldc in rocblas_xtrmm_outofplace

Fixed

  • FORTRAN interfaces generalized for FORTRAN compilers other than gfortran
  • fix for trsm_strided_batched rocblas-bench performance gathering
  • Fix for rocm-smi path in commandrunner.py script to match ROCm 5.2 and above

rocBLAS 2.45.0 for ROCm 5.3.3

17 Nov 19:21
7af9b04
Compare
Choose a tag to compare

rocBLAS code for ROCm 5.3.3 did not change. The library was rebuilt for the updated ROCm 5.3.3 stack.