Releases · ROCm/rocBLAS

Level 2 functions and level 3 trsm have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
Cache flush timing for gemm_batched_ex, gemm_strided_batched_ex, axpy
Benchmark class for common timing code
An environment variable "ROCBLAS_DEFAULT_ATOMICS_MODE" to set default atomics mode during creation of 'rocblas_handle'
Extended dot_ex to support single-precision (fp32_r) input and double-precision (fp64_r) output and compute types

Optimizations

Improved performance of Level 1 dot_batched and dot_strided_batched for all precisions. Performance enhanced by 6 times for bigger problem sizes measured on MI210 GPU

Changes

Linux AOCL dependency updated to release 4.2 gcc build
Windows vcpkg dependencies updated to release 2024.02.14
Increased default device workspace from 32 to 128 MiB for architecture gfx9xx with xx >= 40

Deprecations

rocblas_gemm_ex3, gemm_batched_ex3 and gemm_strided_batched_ex3 are deprecated and will be removed in the next major release of rocBLAS. Please refer to hipBLASLt for future 8 bit float usage https://github.com/ROCm/hipBLASLt

Assets 2

04 Jun 16:53

rocm-ci

rocm-6.1.2

8443539

rocBLAS 4.1.2 for ROCm 6.1.2

Fixes

Fixes BF16 TT get_solutions

Optimizations

Tune gfx942 BBS TN, TT

Assets 2

08 May 18:00

rocm-ci

rocm-6.1.1

5b85f2d

rocBLAS 4.1.0 for ROCm 6.1.1

rocBLAS code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.

Assets 2

16 Apr 19:10

rocm-ci

rocm-6.1.0

cefa4a9

rocBLAS 4.1.0 for ROCm 6.1.0

Additions

Level 1 and Level 1 Extension functions have additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments.
Cache flush timing for gemm_ex.

Changes

Some Level 2 function argument names have changed 'm' to 'n' to match legacy BLAS, there was no change in implementation.
Standardized the use of non-blocking streams for copying results from device to host.

Fixes

Fixed host-pointer mode reductions for non-blocking streams.

Assets 2

31 Jan 20:12

rocm-ci

rocm-6.0.2

88df972

rocBLAS 4.0.0 for ROCm 6.0.2

rocBLAS code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.

Assets 2

15 Dec 18:30

rocm-ci

rocm-6.0.0

88df972

rocBLAS 4.0.0 for ROCm 6.0.0

Added

Addition of beta API rocblas_gemm_batched_ex3 and rocblas_gemm_strided_batched_ex3
Added input/output type f16_r/bf16_r and execution type f32_r support for Level 2 gemv_batched and gemv_strided_batched
Added rocblas_status_excluded_from_build to be used when calling functions which require Tensile when using rocBLAS built without Tensile
Added system for async kernel launches setting a failure rocblas_status based on hipPeekAtLastError discrepancy

Optimized

Trsm performance for small sizes m < 32 && n < 32

Deprecated

In a future release atomic operations will be disabled by default so results will be repeatable. Atomic operations can always be enabled or disabled using the function rocblas_set_atomics_mode. Enabling atomic operations can improve performance.

Removed

rocblas_gemm_ext2 API function is removed
in-place trmm API from Legacy BLAS is removed. It is replaced by an API that supports both in-place and out-of-place trmm
int8x4 support is removed. int8 support is unchanged
The #define STDC_WANT_IEC_60559_TYPES_EXT has been removed from rocblas-types.h. Users who want ISO/IEC TS 18661-3:2015 functionality must define STDC_WANT_IEC_60559_TYPES_EXT before including float.h, math.h, and rocblas.h
The default build removes device code for gfx803 architecture from the fat binary

Fixed

Make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimension or increment potentially causing overflow:
- Level2: gbmv, gemv, hbmv, sbmv, spmv, tbmv, tpmv, tbsv, tpsv
Lazy loading to support heterogeneous architecture setup and load appropriate tensile library files based on the device's architecture
Guard against no-op kernel launches resulting in potential hipGetLastError

Changed

Default verbosity of rocblas-test reduced. To see all tests set environment variable GTEST_LISTENER=PASS_LINE_IN_LOG

Assets 2

13 Oct 18:57

rocm-ci

rocm-5.7.1

b80e422

rocBLAS 3.1.0 for ROCm 5.7.1

rocBLAS code for ROCm 5.7.1 did not change. The library was rebuilt for the updated ROCm 5.7.1 stack.

Assets 2

15 Sep 17:29

rocm-ci

rocm-5.7.0

b80e422

rocBLAS 3.1.0 for ROCm 5.7.0

Added

yaml lock step argument scanning for rocblas-bench and rocblas-test clients. See Programmers Guide for details.
rocblas-gemm-tune is used to find the best performing GEMM kernel for each of a given set of GEMM problems.

Fixed

make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimensions or increments potentially causing overflow:
- Level 1: axpy, copy, rot, rotm, scal, swap, asum, dot, iamax, iamin, nrm2
- Level 2: gemv, symv, hemv, trmv, ger, syr, her, syr2, her2, trsv
- Level 3: gemm, symm, hemm, trmm, syrk, herk, syr2k, her2k, syrkx, herkx, trsm, trtri, dgmm, geam
- General: set_vector, get_vector, set_matrix, get_matrix
- Related fixes: internal scalar loads with > 32bit offsets
- fix in-place functionality for all trtri sizes

Changed

dot when using rocblas_pointer_mode_host is now synchronous to match legacy BLAS as it stores results in host memory
enhanced reporting of installation issues caused by runtime libraries (Tensile)
standardized internal rocblas C++ interface across most functions

Deprecated

Removal of STDC_WANT_IEC_60559_TYPES_EXT define in future release

Dependencies

optional use of AOCL BLIS 4.0 on Linux for clients
optional build tool only dependency on python psutil

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removals

Fixes

Additions

Optimizations

Changes

Deprecations

Fixes

Optimizations

Additions

Changes

Fixes

Added

Optimized

Deprecated

Removed

Fixed

Changed

Added

Fixed

Changed

Deprecated

Dependencies

Releases: ROCm/rocBLAS

rocBLAS 4.2.1 for ROCm 6.2.2

rocBLAS 4.2.1 for ROCm 6.2.1

Removals

Fixes

rocBLAS 4.2.0 for ROCm 6.2.0

Additions

Optimizations

Changes

Deprecations

rocBLAS 4.1.2 for ROCm 6.1.2

Fixes

Optimizations

rocBLAS 4.1.0 for ROCm 6.1.1

rocBLAS 4.1.0 for ROCm 6.1.0

Additions

Changes

Fixes

rocBLAS 4.0.0 for ROCm 6.0.2

rocBLAS 4.0.0 for ROCm 6.0.0

Added

Optimized

Deprecated

Removed

Fixed

Changed

rocBLAS 3.1.0 for ROCm 5.7.1

rocBLAS 3.1.0 for ROCm 5.7.0

Added

Fixed

Changed

Deprecated

Dependencies